Python Machine Learning Linear Regression with Scikit- learn

Linear regression is one of the most fundamental machine learning technique in Python. For more on linear regression fundamentals click here. In this blog, we will build a regression model to predict house prices by looking into independent variables such as crime rate, % lower status population, quality of schools etc. We will be leveraging Scikit-learn library and in built data set called “Boston”.

Let’s now jump onto how to build a multiple linear regression model in Python.

Import packages and Boston dataset

Image 1- Importing Packages and Boston Dataset

Explore Boston Dataset

Image 2- Explore Boston Dataset

Creating Features and Labels and Running Correlations

Image 3- Creating Features and Labels and Running Correlations

Creating Features and Labels and Running Correlation Heatmap

Image 4- Creating Features and Labels and Running Correlation Heatmap

Test/Train Split, Linear Regression Model Fitting and Model Evaluation

Image 5- Test/Train Split, Linear Regression Model Fitting and Model Evaluation

Appending Predicted Data and Plotting the Errors

Image 6- Appending Predicted Data and Plotting the Errors

You can see from the above metrics that overall this plain vanilla regression model is doing a decent job. However, it can be significantly improved upon by either doing feature engineering such as binning, multicollinearity and heteroscedasticity fixes etc. or by leveraging more robust techniques such as Elastic Net, Ridge Regression or SGD Regression, Non Linear models.

Mean Squared Error (MSE)

Image 7- Mean Squared Error (MSE) Definition

Mean Absolute Percent Error (MAPE)

Image 8- Mean Absolute Percent Error (MAPE)

Model Evaluation Metrics

Fitting Linear Regression Model using Statmodels

Image 9- Fitting Linear Regression Model using Statmodels

OLS Regression Output

Image 10- OLS Regression Output

itting Linear Regression Model with Significant Variables

Image 11- Fitting Linear Regression Model with Significant Variables

Heteroscedasticity Consistent Linear Regression Estimates

Image 12- Heteroscedasticity Consistent Linear Regression Estimates

More details on the metrics can be found at the below links-

Wiki

Here is a blog with excellent explanation of all metrics

Cheers!

One thought on “Python Machine Learning Linear Regression with Scikit- learn

  1. Pingback: Learn Python Step by Step | RP's Blog on data science