Linear regression is one of the most fundamental machine learning technique in Python. For more on linear regression fundamentals click here. In this blog, we will build a regression model to predict house prices by looking into independent variables such as crime rate, % lower status population, quality of schools etc. We will be leveraging Scikit-learn library and in built data set called “Boston”.
Let’s now jump onto how to build a multiple linear regression model in Python.
Image 1- Importing Packages and Boston Dataset
Image 2- Explore Boston Dataset
Image 3- Creating Features and Labels and Running Correlations
Image 4- Creating Features and Labels and Running Correlation Heatmap
Image 5- Test/Train Split, Linear Regression Model Fitting and Model Evaluation
Image 6- Appending Predicted Data and Plotting the Errors
You can see from the above metrics that overall this plain vanilla regression model is doing a decent job. However, it can be significantly improved upon by either doing feature engineering such as binning, multicollinearity and heteroscedasticity fixes etc. or by leveraging more robust techniques such as Elastic Net, Ridge Regression or SGD Regression, Non Linear models.
Image 7- Mean Squared Error (MSE) Definition
Image 8- Mean Absolute Percent Error (MAPE)
Model Evaluation Metrics
Image 9- Fitting Linear Regression Model using Statmodels
Image 10- OLS Regression Output
Image 11- Fitting Linear Regression Model with Significant Variables
Image 12- Heteroscedasticity Consistent Linear Regression Estimates
More details on the metrics can be found at the below links-
Here is a blog with excellent explanation of all metrics