Learn Data Science using Python Step by Step

Here is how you can learn Data Science using Python step by step. Please feel free to reach out to me on my personal email id rpdatascience@gmail.com if you have any question or comments related to any topics.


  1. Setup Python environment
  2. How to start jupyter notebook
  3. Install and check Packages
  4. Arithmetic operations
  5. Comparison or logical operations
  6. Assignment and augmented assignment in Python
  7. Variables naming conventions
  8. Types of variables in Python and typecasting
  9. Python Functions
  10. Exception handling in Python
  11. String manipulation and indexing
  12. Conditional and loops in Python
  13. Python data structure and containers
  14. Introduction to Python Numpy
  15. Introduction to Python SciPy
  16. Introduction to Python Pandas
  17. Python pivot tables
  18. Pandas join tables
  19. Missing value treatment
  20. Dummy coding of categorical variables 
  21. Basic statistics and visualization
  22. Data standardization or normalization
  23. Linear Regression with scikit- learn (Machine Learning library)
  24. Lasso, Ridge and Elasticnet Regularization in GLM
  25. Logistic Regression with scikit- learn (Machine Learning library)
  26. Hierarchical clustering with Python
  27. K-means clustering with Scikit Python
  28. Decision trees using Scikit Python
  29. Principal Component Analysis (PCA) using Scikit Python- Dimension Reduction
  30. Linear Discriminant Analysis (LDA) using Scikit Python- Dimension Reduction and Classification
  31. Market Basket Analysis or Association Rules or Affinity Analysis or Apriori Algorithm
  32. Recommendation Engines using Scikit-Surprise
  33. Price Elasticity of Demand using Log-Log Ordinary Least Square (OLS) Model
  34. Timeseries Forecasting using Facebook Prophet Package
  35. Model Persistence and Productionalization Using Python Pickle
  36. Deep Learning- Introduction to deep learning and environment setup
  37. Deep Learning- Multilayer perceptron (MLP) in Python
  38. Deep Learning- Convolution Neural Network (CNN) in Python
  39. Other topics (coming soon)



Auto Regressive Integrated Moving Average (ARIMA) Time Series Forecasting


Autoregressive Integrated Moving Average (ARIMA) is one of the most popular technique for time series modeling. This is also called Box-Jenkins method, named after the statisticians who pioneered some of the latest developments on this technique.

We will focus on following broad areas-

  • What is a time series? We have covered this in another article. Click here
  • Explore a time series data. Please refer to the slides 2 to 7 of the below deck and Click here 
  • What is an ARIMA modeling
  • Discuss stationarity of a time series
  • Fit an ARIMA model, evaluate model’s accuracy and forecast for future

What is an ARIMA modeling-

An ARIMA model has following main components. However, not all models need to have all of the below mentioned components.

  • Autoregressive (AR)

Value of a time series at time period t (yt) is a function of values of time series at previous time periods ‘p’

yt = Linear function of yt-1, yt-2,….., yt-p + error

  • Integrated (I)

To make a time series stationary (discussed below), sometimes we need to difference successive observation and model that. This process is known as integration and differencing order is represented as ‘d’ in an ARIMA model.

  • Moving Average (MA)

Value of a time series at time period t (yt) is a function of errors at previous time periods ‘q’

yt = Linear function of Et-1, Et-2,….., Et-q + error

Based on the combinations of the above factors, we can have following and other models-

  • AR- Only autoregressives terms
  • MA- Only moving averages terms
  • ARMA- Both autoregressive and moving average terms
  • ARIMA- Autoregressive, moving average terms and integration terms. After the differencing step, the model becomes ARMA

A general ARIMA model is represented as ARIMA(p,d,q) where p, d and q represent AR, Integrated and moving averages respectively. Whereas each of p,d and q are integers higher than or equal to zero.

Stationarity of a time series- 

A time series is called stationary where it has a constant mean and variance across the time period, i.e. mean and variance don’t depend on time. It other words, it should not have any trend and dispersion in variance of the data over a period of time. This is also called white noise.

Please refer to slides 8 to 11 of the below deck for live examples of this discussion

From the plot of our air passengers time series, we can tell that the time series is not stationary. Moreover, a time series needs to be stationary or made stationary before being fed into ARIMA modeling.

Statistically, Augmented Dickey–Fuller test is used for testing the stationarity of a time series. Generally speaking the null hypothesis (H0) is that the series is “Non-Stationary” and the alternative hypothesis (Ha) is that series is “Stationary”.

If the p statistics generated from the test is less than <0.05 we can reject the null hypothesis. Otherwise, we need to accept the null hypothesis.

From the ADF test we can see that the p values is close to 0.78 and which is more than 0.05 and hence we need to accept the null hypothesis that is the series is “Non Stationary”

How do we make a time series stationary? Well, we can do it two ways-

  • Manual- Transformation and Differecing etc. Let’s look at an example.
  • Automated- The Integrated term (d)in the ARIMA will make it stationary. This we will do in the model fitting phase. Generally speaking we don’t require d>1 to make a time series stationary
  • Auto.arima ( ) will take care of this automatically and fit the best model

Fit a model, evaluate model’s accuracy and forecast

We will use auto.arima ( ) to fit the best model and evaluate model fitment and performance using following main parameters.

Please refer to slides 12-18 of the below deck

A good time series model should have following characteristics-

  • Residuals shouldn’t show any trends over time.
  • Auto correlation Factors(ACF) and Partial Auto correlation Factor (PACF) shouldn’t have large values (beyond significance level)for any lags. ACF indicates correlation between the current value to all the previous values in a range. PACF is an extension of ACF, where it removes the correlation of the intermediate lags. You can read more on this here.
  • Errors shouldn’t show any seasonality
  • Errors should be normally distributed
  • Error (MAE, MAPE, MSE etc.) should be low
  • AIC, BIC should be relatively lower compared to other alternative models.

The codes and presentation 


Thank you!

Holt Winters Time Series Forecasting

What is a time series?

When we track a certain variable over an interval of time (generally at an equal interval of time) the resulting process is called a time series.

Let’s Look at some examples of time series in our daily life

1. Closing price of Apple stock on a daily basis will be a time series

Example of Time Series- Apple Stock Price Trend Pulled from Google Finance

Example of Time Series- Apple Stock Price Trend Pulled from Google Finance

2. GDP of the world over last several decades so will again be a time series again-

Example of Time Series- World GDP Trend Over Last Several Decades from World Bank

Example of Time Series- World GDP Trend Over Last Several Decades from World Bank

3. Similarly, the hourly movement of the Bitcoin prices in a day will be a time series

Example of Time Series- Hourly Bitcoin Prices from Coindesk

Example of Time Series- Hourly Bitcoin Prices from Coindesk

As you can see from the above examples, the duration of the time can vary for a time series. It can be minutes, days, hours, weeks. months, quarters, years or any other time period. However, one thing that will be common in all time series will be that a particular variable is being measured over a period of time.

What is a time series modeling?

A time series modelling is a statistical exercise where we try to achieve following two main objectives,

1. Visualize and understand the pattern of a particular time series. For example, if you are looking at the sales of an eCommerce company you would like to understand how it has performed over a period of time, which months it goes higher and lower etc.

2. By looking into the historical pattern, forecast what may happen in the future in that particular time series

What are the business usage of a time series modeling?

Time series modelling is used for a variety of different purpose. Some examples are listed below-

1. Forecast sales of an eCommerce company for the next quarter and next one year for financial planning and budgeting

2. Forecast call volume on a given day to efficiently plan resources in a call center

3. Predict trends in the future stock price movement for technical trading of that stock in a stock market

How is a time series forecasting different from a regression modeling?

One of the biggest difference between a time series and regression modeling is that a time series leverages the past value of the same variable to predict what is going to happen in the future.

On the other hand, a regression modeling such as a multiple linear regression will predict the value of a certain variable as a function of other variables

Let’s take an example to make this point more clear. If you are trying to protect the sales of an E-Commerce company as a function of what has been the sales in the past quarter this is a time series modelling

On the other hand, if you are trying to predict the sales of the same E-Commerce company as a function of other variables such as the marketing spend, price of the product and other such contributing factors, it is a regression modelling

What are the constituents of a time series?

A time series could be made up of following main parts

1. Trend- A systematic pattern of how the time series is behaving over a period of time. For example- GDP of emerging economies such as India is growing over a period of time

2. Seasonality- Peaks and troughs which happen during the same time. For example- sales of retailers in US goes higher during Thanksgiving and Black Friday

3. Random noise- As the name suggests, this is the random pattern in a time series

4. Cyclical- Cycles such as Fuel prices go low during certain time and higher at other times. Generally speaking a cycle is long in duration.

Please note that not all time series will have all these components.

Let’s look at example of the time series components. This has been done in R using the decompose function.

Additive Seasonal Model-

This model is used when the time series shows additive seasonality. For example, an eCommerce company sales in October of each year is $2MM USD higher than the base level sales regardless of what is the base level sales in that particular year. In very simplified mathematical equation it can be represented as

Observed = Trend + Seasonal + Random

Please take a look at the slide 2 and 3 of the below presentation

Multiplicative Seasonal Model-

This model is used when the time series shows multiplicative seasonality. For example, for an eCommerce company sales in October of each year is 1.2 times the base level sales in the year. If a particular year has low base level sales, the sales in October will be lower in absolute sense, however it will be 1.2x of the base level sales. In very simplified mathematical equation it can be represented as

Observed = Trend x Seasonal x Random

Please take a look a the slide 4 of the below presentation

Let’s now fit Exponential Smoothing to the above data example. Holt Winters is one of the most popular technique for doing exponential smoothing of a time series data. Moreover, we can fit both additive and multiplicative seasonal time series using HoltWinters() function in R.

There are many parameters that one can pass on this method, however one doesn’t need to pick these parameters as R will automatically pick the best settings to minimize the Square Error between the predicted and the actual values for the forecast.

The three most important parameters that one needs to pay attention to are-

alpha = Value of smoothing parameter for the base level.
beta = Value of smoothing parameter for the trend.
gamma = Value of smoothing parameter for the seasonal component.

All three of the above parameters range between 0 and 1

  • If beta and gamma are both zero and alpha is non zero, this is known as Single Exponential Smoothing
  • If gamma is zero but both beta and alpha are non zero, this is known as Double Exponential Smoothing with trend
  • If all three of them are non zero, this is knows as Triple Exponential Smoothing or Holt Winters with trend and seasonality.

In the below example, we will let R choose the optimized parameters for us.

Additive Seasonal Holt Winters Model

Let’s fit an additive model first and compute MAE. The general form of an additive model is shown below.

yt = base + linear *  t + St + Random Error


yt = forecast at time period t

base = Base signal

linear = linear trend component

t= time period t

St = Additive seasonal factor

This is the model that R has fitted for us-

HoltWinters(x = fl, seasonal = “additive”)

Smoothing parameters:
alpha: 0.2479595
beta : 0.03453373
gamma: 1

a 477.827781
b 3.127627
s1 -27.457685
s2 -54.692464
s3 -20.174608
s4 12.919120
s5 18.873607
s6 75.294426
s7 152.888368
s8 134.613464
s9 33.778349
s10 -18.379060
s11 -87.772408
s12 -45.827781

See Slide # 11 on how to use the above model output to compute forecast for any given time period. However, you don’t have to do it by hand as R will do it for you. Nevertheless, good to know how to use the above model output.

Finally, let’s notice that MAE of the additive model comes out to be 9.774438

Multiplicative Seasonal Holt Winters Model

The general form of a multiplicative model is shown below-

yt = (base + linear *  t )* St + Random Error


yt = forecast at time period t

base = Base signal

linear = linear trend component

t= time period t

St = Additive seasonal factor

This is the model that R has fitted for us-

HoltWinters(x = fl, seasonal = “multiplicative”)

Smoothing parameters:
alpha: 0.2755925
beta : 0.03269295
gamma: 0.8707292

a 469.3232206
b 3.0215391
s1 0.9464611
s2 0.8829239
s3 0.9717369
s4 1.0304825
s5 1.0476884
s6 1.1805272
s7 1.3590778
s8 1.3331706
s9 1.1083381
s10 0.9868813
s11 0.8361333
s12 0.9209877

As you can see from the above output, the seasonality shows that demand for the air travel is the highest in July and August of each year and lowest in November.

Moreover the MAE for this model is 8.393662. Therefore, in this case a multiplicative Holt Winters seasonal model is able to provide us a better forecast compared to an additive model.

All the codes and output can be found here and in the below presentation.

Here is the forecast generated from the model-

HoltWinters Timeseries in R- Forecast for next 20 months using Multiplicative Model

HoltWinters Timeseries in R- Forecast for next 20 months using Multiplicative Model

Thank you!

Analytical Problem Solving- Types of Reasoning

To solve any problem we need to have some way of breaking down the problem. There are two main ways of reasoning to that effect-

  • Deductive Reasoning–  This is also called as “Top Down” approach or “Formal Logic” approach. The key here is to form hypotheses to explain a certain phenomenon and then go to reject or accept related hypotheses. The conclusions and recommendation coming out from this sort of reasoning are more certain and factual in nature.
    • For example, let’s say you are trying to explain why a certain car gives lower miles per gallon. Because you know the business and have more context on this problem, you can start with potential hypotheses-
      • Weight of the car is high
      • Car has higher number or cylinders
      • Car has higher horse power
      •  and so on…

You will check each of the above hypotheses and reach to a definite conclusion.

  • Inductive Reasoning– On the other hand, this is a “Bottom Up” approach or “Informal Logic” approach. This sort of reasoning is more exploratory in nature. The end goal is to form some hypotheses to give possible reasons to explain certain phenomenon.
    • For example, let’s say you are trying to explain why sales of an eCommerce company has gone down in a particular quarter. You may begin by an exploratory analysis where you can begin with potential driver factors such as-
      • Marketing spend of the company
      • Pricing
      • Competitive landscape
      •  Macro economic factors

You will do data analysis to correlate each of the above factors to the sales and find potential reasons or build potential hypotheses to be tested further.



Lasso, Ridge and Elastic Net Regularization

Regularization techniques in Generalized Linear Models (GLM) are used during a modeling process for many reasons. A regularization technique helps in the following main ways-

  1. Doesn’t assume any particular distribution of the dependent variable ( DV). The DV can follow any distribution such as normal, binomial, possison etc. Hence the name Generalized Linear Models (GLMs)
  2. Address Variance-Bias Tradeoffs. Generally will lower the variance from the model
  3. More robust to handle multicollinearity
  4. Better sparse data (observations < features) handling
  5. Natural feature selection
  6. More accurate prediction on new data as it minimizes overfitting on the train data
  7. Easier interpretation of the output

And so on…

What is a regularization technique you may ask? A regularization technique is in simple terms a penalty mechanism which applies shrinkage (driving them closer to zero) of coefficient to build a more robust and parsimonious model. Although there are many ways to regularize a model, few of the common ones are-

  1. L1 Regularization aka Lasso Regularization– This add regularization terms in the model which are function of absolute value of the coefficients of parameters. The coefficient of the paratmeters can be driven to zero as well during the regularization process. Hence this technique can be used for feature selection and generating more parsimonious model
  2. L2 Regularization aka Ridge Regularization – This add regularization terms in the model which are function of square of coefficients of parameters. Coefficient of parameters can approach to zero but never become zero.
  3. Combination of the above two such as Elastic Nets– This add regularization terms in the model which are combination of both L1 and L2 regularization.

For more on the regularization techniques you can visit this paper.

Scikit help on Lasso Regression

Here is a working example code on the Boston Housing data. Please note, generally before doing regularized GLM regression it is advised to scale variables. However, in the below example we are working with the variables on the original scale to demonstrate each algorithms working.



Model Persistence Using Python Pickle

After you have built a machine learning model which is doing a great job in prediction, you don’t have to retrain your model again and again for future usage. Instead, you can use Python pickle serialization for reusing this model in future and transferring it into a production environment where non modelers can also use this model to make predictions.



By Renee Comet (photographer) [Public domain], via Wikimedia Commons

First let’s look at how Wikipedia defines a pickle

Pickling is the process of preserving or expanding the lifespan of food by either anaerobic fermentation in brine or immersion in vinegar. The resulting food is called a pickle.

Python pickling is the same process without brine or vinegar, whereas you will pickle your model for longer usage without the need for you to recook your models. In a “Pickling” process a Python object is converted into a byte stream. On the other hand, in an “Unpickling” process a byte stream is converted back into Python object.

I strongly recommend that you read Python Official Documentation on this topic before moving forward.

Now let’s see this live in action. We will first look at a simple example and then look at a model example.

Example 1- In this we will pickle and un-pickle a simple Python list


Example 2- In this we will pickle and un-pickle a Decision Tree classifier and use it later for making predictions on a new data


For more details, do check out this excellent presentation.



Recurrent Neural Network (RNN) in Python

Recurrent Neural Network (RNN) are a special type of feed-forward network used for sequential data analysis where inputs are not independent and are not of fixed length as is assumed in some of the other neural networks such as MLP.  Rather in this case, inputs are dependent on each other along the time dimension. In other words, what happens in time ‘t’ may depend on what happened in time ‘t-1’, ‘t-2’ and so on.

These are also called ‘memory’ networks as previous inputs and states persist in the model for doing a more optimal sequential analysis. They can have both short term and long term time dependence. Due to their capabilities of handling sequential data very well, these networks are typically very suitable for speech recognition, sentiment analysis, forecasting, language translation and other such applications.

Let’s now spend sometime looking at how a RNN work-

Recurrent Neural Network (RNN)

Recurrent Neural Network (RNN)

As you may recall, in a typical feed-forward neural network input is fed at beginning and then hidden layers do the processing and finally output layer spits out the output. On the other hand, in a RNN generally speaking we will have different input, output and cost function for each time stamp. However, the same weight matrix is fed to all layers in the network.

One point to note is that RNNs are also trained using backward propagation of errors and gradient descent to minimize cost function. However, backward propagation in RNN happen over different time stamps and hence it’s called Backward Propagation Through Time (BPTT). In a typical RNN, we may have several time stamp layers which sometimes may range in hundreds or thousands and therein lies the problem of vanishing gradient or exploding gradient that these pure vanilla RNNs are particularly susceptible for.

There are various techniques such as gradient clipping and architecture such as Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) which help in fixing the vanishing gradient and exploding gradient issues. We will delve deeper into how an LSTM work.

A LSTM network consist of hidden layers that have many LSTM blocks or units. In turn each LSTM unit will have the following components-

  • Memory Cell- The component that remembers the values over a period of time. This has an activation function
  • Input gate- Enables addition of info to the memory cell. Generally has as an tanh activation to squash the values between -1 and +1
  • Forget gate- Enables removing or retaining from the memory cell. This will generally have a sigmoid activation function and hence the output values will range between 0 and 1. If the gate is on, then all memories are retained. If the gate is turned-off, all values will be removed.
  • Output gate- Retrieve information from the memory cell passed through the tanh activation
Long Short Term Memory Cell or Block

Long Short Term Memory Cell or Block (Source- Wiki)

Let’s work through an example which we used in a previous article.



Here is an excellent article in case you want to explore more.



Ensemble Modeling using Python

Ensemble models are a great tool to fix the variance-bias trade-off which a typical machine learning model faces, i.e. when you try to lower bias, variance will go higher and vice-versa. This generally results in higher error rates.

Total Error in Model = Bias + Variance + Random Noise

Variance and Bias Trade-off

Variance and Bias Trade-off

Ensemble models typically combine several weak learners to build a stronger model, which will reduce variance and bias at the same time. Since ensemble models follow a community learning or divide and conquer approach, output from ensemble models will be wrong only when the majority of underlying learners are wrong.

One of the biggest flip side of ensemble models is that they may become “Black Box” and not very explainable as opposed a simple machine learning model. However, the gains in model performances generally outweigh any loss in transparency. That is the reason why you will see top performing models in many high ranking competitions will be generally an ensemble model.

Ensemble models can be broken down into the following three main categories-

  1. Bagging
  2. Boosting
  3. Stacking

Let’s look at each one of them-


  • One good example of such model is Random Forest
  • These types of ensemble models work on reducing the variance by removing instability in the underlying complex models
  • Each learner is asked to do the classification or regression independently and in parallel and then either a voting or averaging of the output of all the learners is done to create the final output
  • Since these ensemble models are predominantly focuses on reducing the variance, the underlying models are fairly complex ( such as Decision Tree or Neural Network) to begin with low bias
  • An underlying decision tree will have higher depth and many branches. In other words, the tree will be deep and dense and with lower bias


  • Some good examples of these types of models are Gradient Boosting Tree, Adaboost, XGboost among others.
  • These ensemble models work with weak learners and try to improve the bias and variance simultaneously by working sequentially.
  •  These are also called adaptive learners, as learning of one learner is dependent on how other learners are performing. For example, if a certain set of the data has higher mis-classification rate, this sample’s weight in the overall learning will be increased so that the other learners focus more on correctly classifying the tougher samples.
  • An underlying decision tree will be shallow and a weak learner with higher bias

There are various approaches for building a bagging model such as- pasting, bagging, random subspaces, random patches etc. You can find all details over here.


  • These meta learning models are what the name suggest. They are stacked models. Or in other words, a particular learner’s output will become an input to another model and so on.

Working examples-

First install xgboost via conda install-

Step1 – search packages  using “anaconda search -t conda xgboost”

Step 2- install a particular package such as ” conda install py-xgboost”

Coming soon….



Deep Learning- Convolution Neural Network (CNN) in Python

Convolution Neural Network (CNN) are particularly useful for spatial data analysis, image recognition, computer vision, natural language processing, signal processing and variety of other different purposes. They are biologically motivated by functioning of neurons in visual cortex to a visual stimuli.

What makes CNN much more powerful compared to the other feedback forward networks for image recognition is the fact that they do not require as much human intervention and parameters as some of the other networks such as MLP do. This is primarily driven by the fact that CNNs have neurons arranged in three dimensions.

CNNs make all of this magic happen by taking a set of input and passing it on to one or more of following main hidden layers in a network to generate an output.

  • Convolution Layers
  • Pooling Layers
  • Fully Connected Layers

Click here to see a live demo of a CNN

Let’s dig deeper into utility of each of the above layers.

Convolution Layers– Before we move this discussion any further, let’s remember that any image or similar object can be represented as a matrix of numbers ranging between 0-255. Size of this matrix will be determined by the size the image in the following fashion-

Height X Width X Channels

Channels =1 for grey-scale images

Channels =3 for colored images

For example, if we feed an image which is 28 by 28 square in pixels and on the grey scale. This image will be a matrix of numbers in the below fashion-

28*28*1. Each of the 784 pixels can any values between 0-255 depending on the intensity of grey-scale.

Now let’s talk about what happens in a convolution layer. The main objective of this layer is to derive features of an image by sliding smaller matrix called kernel or filter over the entire image through convolution.

What is convolution? Convolution is taking a dot product between the filter and the local regions

Kernels can be many types such as edge detection, blob of color, sharpening, blurring etc. You can find some main kernels over here.  Please note that we can specify the number of filters during the network training process, however network will learn the filters on its own.

As a result of this convolution layers, the network creates numbers of features maps. The size of feature maps depends on the # of filters (kernels), size of filters, padding (zero padding to preserve size), and strides (steps by which a filter scans the original image). Please note that a non linear activation function such Relu or Tanh is applied at each convolution layer to generate modified feature maps.


Source: https://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

Pooling Layer– The arrays generated from the convolution layers are generally very big and hence pooling layer is used predominantly to reduce the feature maps and retain the most important aspect.  In other words this facilitate “Downsampling” using algorithms such as max pooling or average pooling etc. Moreover, as the numbers of parameters in the network are truncated, this layer also helps in avoiding over fitting.  It is common to have pooling layers in between different convolution layers.

Fully Connected Layer– This enables every neuron in the layers to be interconnected to the neurons from the previous and next layer to take the matrix inputs from the previous layers and flatten it to pass on to the output layer. Which in turn will make prediction such as classification probability.

Here is an excellent write-up which provides further details on all of the above steps.

Since we know enough about how a CNN works, let’s code now-

In this example, we will be working with MNIST dataset and build a CNN to recognize handwritten digits from 0-9. We will be using classification accuracy as a metric to evaluate the model’s performance. Please see link for MNIST CNN working

Please note that CNN need very high amount of computational power and memory and hence it’s recommended that you run this in GPUs or Cloud. CPUs may not be able to fit the model.  Furthermore, you may need to reduce batch size to a lower level to ensure algorithm runs successfully.



As you can see, the above model gives 99%+ accuracy in the classification.


Time-series Forecasting Using Facebook Prophet Package

Forecasting is a technique that is used for a variety of different purposes and situations such as sales forecasting, operational and budget planning etc. Similarly there are a variety of different techniques such as moving averages, smoothening, ARIMA etc to statistically make a forecast.

In this article we will talk about an open source package called “Prophet” from Facebook which takes away the complexity of other techniques without compromising on accuracy of the forecast. The guiding principle of this approach is General Additive Models (GAMs). More on which can be found over here.

Let’s look at an example of how to deploy Prophet in Python.