What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Ridge and Lasso Regression - Comparative Study

  • May 06, 2021
  • 7 Minutes Read
Ridge and Lasso Regression - Comparative Study


The initial phases of each established business and startups are dependant on the data of several years. By analyzing the expenses and profit of an organization, we can get insights into the future aspects and prospects of the business growth.

With regression analysis, we can help the finance and investment department in multinational businesses, to not evaluate risks for their company but also analyse the trends in other rival business and use them to their profits. Regression analysis can also help predict sales for a company based on various parameters like GDP growth, previous sales, market growth, or other types of conditions. 

What is regression?

Predictive modelling technique which evaluates the relation between dependent (ie the target variable) and independent variables is known as regression analysis. Regression analysis can be used for forecasting, time series modelling or finding the relation between the variables and predict continuous values. For example, the relation between household regions and the electricity bill of the household by a driver is best studied through regression.

There are two basic types of regression techniques, which are simple linear regression and multiple linear regression, for more complicated data and analysis we use non-linear regression method like polynomial regression. Simple linear regression uses only one independent variable to create the prediction output of the dependent variable Y, whereas multiple linear regression uses two or more independent variables to create prediction outcome.

The general equation of a regression algorithms are as follows: 

  1. Simple linear regression: Y = a + b*X + u
  2. Multiple linear regression: Y = a + b1*X1 + b2*X2 + b3*X3 + ... + bt*Xt + u


  • Y = Dependent variable
  • X = Independent variable
  • a = the intercept of regression line.
  • b = the slope.
  • u = the regression remainder.

Linear Regression

Lasso Regression

The word “LASSO” denotes Least Absolute Shrinkage and Selection Operator. Lasso regression follows the regularization technique to create prediction. It is given more priority over the other regression methods because it gives an accurate prediction. Lasso regression model uses shrinkage technique. In this technique, the data values are shrunk towards a central point similar to the concept of mean. The lasso regression algorithm suggests a simple, sparse models (i.e. models with fewer parameters), which is well-suited for models or data showing high levels of multicollinearity or when we would like to automate certain parts of model selection, like variable selection or parameter elimination using feature engineering.

Lasso Regression algorithm utilises L1 regularization technique It is taken into consideration when there are more number of features because it automatically performs feature selection.


Lasso Regression


Mathematical equation of Lasso Regression Algorithm:

Residual Sum of Squares + λ * (Sum of the absolute value of the coefficients)
The equation looks like: 

Lasso regression mathematical equation


  • λ = the amount of shrinkage.
  • If λ = 0 it implies that all the features are considered and now it is equivalent to the linear regression in which only the residual sum of squares is used to build a predictive model.
  • If λ = ∞ it implies that no feature is used i.e, as λ gets close to infinity it eliminates more and more features and feature selection is more precise.
  • When the bias increases, the value of λ increases
  • When the variance increases, the value of λ decreases

Lasso Regression Implementation in Python using sklearn

from sklearn.linear_model import Lasso
lassoReg = Lasso(alpha=0.3, normalize=True)
pred = lassoReg.predict(x_cv)
# calculating mse
mse = np.mean((pred_cv - y_cv)**2)


Ridge Regression

Ridge Regression is another type of regression algorithm in data science and is usually considered when there is a high correlation between the independent variables or model parameters. As the value of correlation increases the least square estimates evaluates unbiased values. But if the collinearity in the dataset is very high, there can be some bias value. Therefore, we create a bias matrix in the equation of Ridge Regression algorithm. It is a useful regression method in which the model is less susceptible to overfitting and hence the model works well even if the dataset is very small.


The cost function for ridge regression algorithm is:

cost function for ridge regression

Where λ is the penalty variable. λ given here is denoted by an alpha parameter in the ridge function. Hence, by changing the values of alpha, we are controlling the penalty term. Greater the values of alpha, the higher is the penalty and therefore the magnitude of the coefficients is reduced.

We can conclude that it shrinks the parameters. Therefore, it is used to prevent multicollinearity, it also reduces the model complexity by shrinking the coefficient. 

Bias and variance trade-off

Bias and variance trade-off is a complicated algorithm when we try to build ridge regression models on an actual dataset with multiple features. However, following the general trend which we can take note of are:

  • When the bias increases, the value of λ increases
  • When the variance increases, the value of λ decreases

Ridge Regression Implementation in Python using sklearn


from sklearn.linear_model import Ridge
## training the model
ridgeReg = Ridge(alpha=0.05, normalize=True)
pred = ridgeReg.predict(x_cv)
calculating mse
mse = np.mean((pred_cv - y_cv)**2)
## calculating score 
score = ridgeReg.score(x_cv,y_cv) 


Comparative Analysis of Lasso and Ridge Regression

Ridge and Lasso regression uses two different penalty functions for regularisation. Ridge regression uses L2 on the other hand lasso regression go uses L1 regularisation technique. In ridge regression, the penalty is equal to the sum of the squares of the coefficients and in the Lasso, penalty is considered to be the sum of the absolute values of the coefficients. In lasso regression, it is the shrinkage towards zero using an absolute value (L1 penalty or regularization technique) rather than a sum of squares(L2 penalty or regularization technique).

Dimension reduction of feature space with lasso

Since we know that in ridge regression the coefficients can’t be zero. Here, we either consider all the coefficients or none of the coefficients, whereas Lasso regression algorithm technique, performs both parameter shrinkage and feature selection simultaneously and automatically because it nulls out the co-efficients of collinear features. This helps to select the variable(s) out of given n variables while performing lasso regression easier and more accurate.

There is an another type of regularization method, which is ElasticNet, this algorithm is a hybrid of lasso and ridge regression both. It is trained using L1 and L2 prior as regularizer. A practical advantage of trading-off between the Lasso and Ridge regression is that it allows Elastic-Net Algorithm to inherit some of Ridge’s stability under rotation.


Therefore, in this tutorial we got a better understanding of lasso and ridge regression and the mathematical part of the algorithm with implementation using a sample dataset. We also learned comparative analysis of the algorithm.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Apurva Sharma
Hi, I am Apurva Sharma, a data science engineer, deep learning evangelist and DevOps enthusiast