Here are few key algorithms implementation in R
- Linear Regression
- Logistic Regression
- Decision Trees
- Market Basket Analysis
- Sentiment Analysis
- Clustering
Cheers!
Here are few key algorithms implementation in R
Cheers!
What is a “Linear Regression”-
Linear regression is one of the most powerful and yet very simple machine learning algorithm. Linear regression is used for cases where the relationship between the dependent and one or more of the independent variables is supposed to be linearly correlated in the following fashion-
Y = b0 + b1*X1 + b2*X2 + b3*X3 + …..
Here Y is the dependent variable and X1, X2, X3 etc are independent variables. The purpose of building a linear regression model is to estimate the coefficients b0, b1, b2 et cetera that provides the least error rate in the prediction. More on the error will be discussed later in this article.
In the above equation, b0 is the intercept, b1 is the coefficient for variable X1, b2 is the coefficient for the variable X2 and so on…
What is a “Simple Linear Regression” and “ Multiple Linear Regression”?
When we have only one independent variable, resulting regression is called a “Simple Linear Regression” when we have 2 or more independent variables the resulting regression is called “Multiple Linear Regression”
What are the requirements for the dependent and independent variables in the regression analysis?
The dependent variable in linear regression is generally Numerical and Continuous such as sales in dollars, gdp, unemployment rate, pollution level, amount of rainfall etc. On the other hand, the independent variables can be either numeric or categorical. However, please note that the categorical variables will need to be dummy coded before we can use these variables for building a regression model in the sklearn library of Python.
What are some of the real world usage of linear regression?
As we discussed earlier, this is one of the most commonly used algorithm in ML. Some of the use cases are listed below-
Example 1-
Predict sales amount of a car company as a function of the # of models, new models, price, discount,GDP, interest rate, unemployment rate, competitive prices etc.
Example 2-
Predict weight gain/loss of a person as a function of calories intake, junk food, genetics, exercise time and intensity, sleep, festival time, diet plans, medicines etc.
Example 3-
Predict house prices as a function of sqft, # of rooms, interest rate, parking, pollution level, distance from city center, population mix etc.
Example 4-
Predict GDP growth rate as a function of inflation, unemployment rate, investment, new business, weather pattern, resources, population
How do we evaluate linear regression model’s performance?
There are many metrics that can be used to evaluate a linear regression model’s performance and choose the best model. Some of the most commonly used metrics are-
Mean Square Error (MSE)- This is an error and lower the amount the better it is. It is defined using the below formula
Mean Absolute Percent Error (MAPE)- This is an error and lower the amount the better it is. It is defined using the below formula
R Square– This is called coefficient of determination and provides a gauge of model’s explaining power. For example, for a linear regression model with a RSquare of 0.70 or 70% would imply that 70% of the variation in the dependent variable can be explained by the model that has been built.
How do we build a linear regression model in Python?
In this exercise, we will build a linear regression model on Boston housing data set which is an inbuilt data in the scikit-learn library of Python. However, before we go down the path of building a model, let’s talk about some of the basic steps in any machine learning model in Python
In most cases, any of the machine learning algorithm in sklearn library will follow the following steps-
So let’s get started with building this model-
As you can see from the above metrics that overall this plain vanilla regression model is doing a decent job. However, it can be significantly improved upon by either doing feature engineering such as binning, multicollinearity and heteroscedasticity fixes etc. or by leveraging more robust techniques such as Elastic Net, Ridge Regression or SGD Regression, Non Linear models.
—
Image 9- Fitting Linear Regression Model using Statmodels
Image 10- OLS Regression Output
Image 11- Fitting Linear Regression Model with Significant Variables
Image 12- Heteroscedasticity Consistent Linear Regression Estimates
More details on the metrics can be found at the below links-
Here is a blog with excellent explanation of all metrics
Cheers!
You must be logged in to post a comment.