Lasso, Ridge and Elastic Net Regularization

Regularization techniques in Generalized Linear Models (GLM) are used during a modeling process for many reasons. A regularization technique helps in the following main ways-

  1. Doesn’t assume any particular distribution of the dependent variable ( DV). The DV can follow any distribution such as normal, binomial, possison etc. Hence the name Generalized Linear Models (GLMs)
  2. Address Variance-Bias Tradeoffs. Generally will lower the variance from the model
  3. More robust to handle multicollinearity
  4. Better sparse data (observations < features) handling
  5. Natural feature selection
  6. More accurate prediction on new data as it minimizes overfitting on the train data
  7. Easier interpretation of the output

And so on…

What is a regularization technique you may ask? A regularization technique is in simple terms a penalty mechanism which applies shrinkage (driving them closer to zero) of coefficient to build a more robust and parsimonious model. Although there are many ways to regularize a model, few of the common ones are-

  1. L1 Regularization aka Lasso Regularization– This add regularization terms in the model which are function of absolute value of the coefficients of parameters. The coefficient of the paratmeters can be driven to zero as well during the regularization process. Hence this technique can be used for feature selection and generating more parsimonious model
  2. L2 Regularization aka Ridge Regularization – This add regularization terms in the model which are function of square of coefficients of parameters. Coefficient of parameters can approach to zero but never become zero.
  3. Combination of the above two such as Elastic Nets– This add regularization terms in the model which are combination of both L1 and L2 regularization.

For more on the regularization techniques you can visit this paper.

Scikit help on Lasso Regression

Here is a working example code on the Boston Housing data. Please note, generally before doing regularized GLM regression it is advised to scale variables. However, in the below example we are working with the variables on the original scale to demonstrate each algorithms working.



Data Standardization or Normalization

Data standardization or normalization plays a critical role in most of the statistical analysis and modeling. Let’s spend sometime to talk about the difference between the standardization and normalization first.

Standardization is when a variable is made to follow the standard normal distribution ( mean =0  and standard deviation = 1). On the other hand, normalization is when a variable is fitted within a certain range ( generally between 0 and 1). Here are more details of the above.

Let’s now talk about why we need to do the standardization or normalization before many statistical analysis?

  1. In a multivariate analysis when variables have widely different scales, variable(s) with higher range may overshadow the other variables in analysis. For example, let’s say variable X has a range of 0-1000 and variable Y has a range of 0-10. In all likelihood, variable Y will outweigh variable X due to it’s higher range. However, if we standardize or normalize the variable, then we can overcome this issue.
  2. Any algorithms which are based on distance computations such as clustering, k nearest neigbour ( KNN), principal component ( PCA) will be greatly affected if you don’t normalize the data
  3. Neural networks and deep learning networks also need the variables to be normalized for converging faster and giving more accurate results
  4. Multivariate models may become more stable and the coefficients more reliable if you normalize the data
  5. It provides immunity from the problem of outliers

Let’s look at a Python example on how we can normalize data-