Linear Regression fails here….. is Ridge Regression a creative solution? Part-1

ridge regression

Reading Time: 5 Minutes

Is Ridge Regression an optimization of Linear Regression ??? join me in this post to know the perks of Ridge regression and how it can solve the problem of bias and model under-fitting.

Linear Regression fails when the data is sparsely spread(weakly correlated) or there is outliers that cannot be eliminated from the dataset or when the training data is small to learn meaningful patterns or there is multicollinearity. These factors cause poor generalization, often due to high bias (under-fitting) or, in some cases, high variance.

The goal of this post is:

  • Help you develop an intuition of why Ridge Regression (L2 Regularization) is required.
  • What is the logic and math behind Ridge Regression ?

Pre-requisites:

  • A basic understanding of Linear Regression and how Ordinary Least Squares (OLS) work.
  • A basic understanding of differentiation and partial differentiation in calculus.

If you would like have a strong foundation in Linear Regression and how Ordinary Least Squares (OLS) work, please click here to go a post in sapiencespace dedicated completely for the same.

Linear Regression is a very famous and basic concept used in Machine Learning that was originally based from statistics. It is a simple mathematical solution to find or predict target variables based on a given (set of) independent variable(s).


In basic Linear Regression, If features are highly correlated (multicollinearity) or if there are too many irrelevant features, OLS can assign very large coefficients to some variables. Large coefficients make the model sensitive to small fluctuations in input data, leading to poor generalization on new data (overfitting). Same goes the case if the single independent and dependent feature are not highly correlated.

Since, we are exploring this topic for an educational understanding, we are not getting into too complex math calculations with lots of highly correlated independent variables (x). So we’ll derive ridge regression only with one independent variable and a dependent variable. In the next post we’ll understand ridge regression with two independent features to predict a dependent variable.

ben white qDY9ahp0Mto unsplash

The simple math behind Ridge Regression

IMG 9912

In OLS, we just take the sum of squared residuals/ mean square error as our loss function and then proceed with partial differentiation on the loss function with respect to least square estimators (intercept and slope) to find the intercept and slope.

In Ridge regression, we modify the least squares cost function by adding a penalty term to prevent extreme slopes as illustrated in the diagram below:

  • The second term shrinks the slope m to avoid overfitting.
  • λ (lambda) is a hyper-parameter that controls how much we penalize large slopes.

Thumb rules regarding lambda selection(we’ll discuss more about choosing lambda in the implementation part):

  • If λ=0 → Ridge behaves like regular regression.
  • If λ is too large → The slope m shrinks too much, leading to under-fitting.
IMG 9913

Then, we carry out the partial differentiation as done before in the OLS post in order to reach the optimal solution.

IMG 9914

Once we obtain the c ridge value, we can proceed with finding the m ridge value. The thing to focus here is to keep the equation as simple as possible with mathematical simplifications and substitutions.

IMG 9915

The highlighted stuff can be simplified as below, this approach made below from understanding of the pattern in the highlight and verifying the simplified format.

The second simplification is done as the right hand side of the equation looks like the expanded version of the covariance of x and y.

IMG 9917

So, once we get the equation for both the slope and intercept value, we can proceed by plugging in the values into the goal equation to achieve our goal of creating a prediction line for the data.

In the upcoming post, we’ll delve deep into how to handle multicollinearity effectively with ridge regression with 2 independent variables additionally choosing the right lambda value and how R^2 metric works. So stay tuned and post your doubts if any in the comment section below and please hit the like button to help us deliver valuable content to more curious learners like you.

Cover picture and title image credits – unsplash content creators

Click here to view all the concepts related to machine learning.

😀
0
😍
0
😢
0
😡
0
👍
1
👎
0

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe To our Newsletter

Subscription Form

Recently Posted

Share

Subscribe To Newsletter

Search

Home