Linear regression is one of the most fundamental machine learning technique in Python. For more on linear regression fundamentals click here. In this blog, we will build a regression model to predict house prices by looking into independent variables such as crime rate, % lower status population, quality of schools etc. We will be leveraging Scikit-learn library and in built data set called “Boston”.

Let’s now jump onto how to build a multiple linear regression model in Python.

Image 1- Importing Packages and Boston Dataset

Image 2- Explore Boston Dataset

Image 3- Creating Features and Labels and Running Correlations

Image 4- Creating Features and Labels and Running Correlation Heatmap

Image 5- Test/Train Split, Linear Regression Model Fitting and Model Evaluation

Image 6- Appending Predicted Data and Plotting the Errors

You can see from the above metrics that overall this plain vanilla regression model is doing a decent job. However, it can be significantly improved upon by either doing feature engineering such as binning, multicollinearity and heteroscedasticity fixes etc. or by leveraging more robust techniques such as Elastic Net, Ridge Regression or SGD Regression, Non Linear models.

Image 7- Mean Squared Error (MSE) Definition

Image 8- Mean Absolute Percent Error (MAPE)

Model Evaluation Metrics

Image 9- Fitting Linear Regression Model using Statmodels

Image 10- OLS Regression Output

Image 11- Fitting Linear Regression Model with Significant Variables

Image 12- Heteroscedasticity Consistent Linear Regression Estimates

More details on the metrics can be found at the below links-

Wiki

Here is a blog with excellent explanation of all metrics

Cheers!

### Like this:

Like Loading...

Pingback: Learn Python Step by Step | RP's Blog on data science