In this article we will build an Auto ARIMA model using a great package called ‘Pyramid’. Please read the below two articles first if you are not familiar with the time-series modeling and ARIMA in particular.
For the package ‘Pyramid’ to install and work correctly, we need to make sure that we have the latest version of the following packages. Please run the package version check codes mentioned below. Particularly, please ensure that the pip version is 18 or higher for the Pyramid Arima to install properly.
Import necessary packages such as Pyramid-ARIMA, ignore warnings, seasonal decomposition etc.
Import magic command “InteractiveShell” to print many statements on the same line and import flights data in built in the seaborn library.
Here are the outputs generated from the above codes-
Create Pivot Table using Pandas Pivot Table method
Here is the output from the Pivot Tables-
We will now generate heat map to visualize seasonality of the travelers by each month. It can easily be seen that traffic in a year peaks during summer months of July and August
Create a date range variable which captures date range of the above data. We are using Pandas “date_range” method to create timestamped variable on a monthly basis.
Insert the date time column in the original data for doing the time series modeling
Select only selected variables by variable names. In this case we are selecting dateime and # passengers data using loc
Here is how the data looks now-
Reindex using method “set_index” the data on datetime variable
Plot the timeseries data, add x label, y label and title of the plot
Here is how the timeseries looks like-
Decompose timeseries to trend, seasonal and random components using multiplicative model and do subplots with color schemes using hex color picker codes.
Similarly you can decompose timeseries to trend, seasonal and random components using additive model using the below code
decomposition = seasonal_decompose(data, model =’additvie’)
Here is how the additive time-series decomposition looks like-
Test for the stationarity of the time series using Augmented Dickey Fuller Test. Since p value is higher than alpha, we can’t reject the null hypothesis (Series is non stationary). Therefore, we would need to use the “Integrated (I)” term of the time series in a good model.
Split the data in train and test datasets to validate the forecast on the test data
Here is how the Train and Test data look like-
Now is the time that we can fit a Auto ARIMA model, which works on the efficient Grid Search and Random Search concepts to find the most optimal parameters to find the best fitting time series model. Please keep in mind that small p,d,q represent the non-seasonal components and capital P,D,Q represent seasonal components.
In this case, we are trying values from 1 to 8 for each of the above parameters in the efficient parameters search.
Here is the model summary, lower AIC and BIC values denote better performing models
The best performing model from the optimized grid search is the following-
Use the best model to make predictions about the Test data-
Here is how the predicted values for the test time period looks like-
Plot actual vs predicted values for the training and the test data-
Add the predicted values in the original “Test” data to compute error in predictions. Sample output is shown below as well.
Compute error metrics such as Mean Absolute Error, Mean Squared Error and Median Absolute Error.
Now we can generate model diagnostic metrics as discussed in the links at the very beginning of this article. Overall, model performance looks quite robust from these charts.
In sum, we built a time-series model using Pyramid-Arima package and used optimized grid search to find the best parameters. Overall the performance of this model is quite satisfactory.
Thanks for reading! Please share with others.