TransWikia.com

What is the intuition behind decreasing the slope when using regularization?

Data Science Asked on September 5, 2021

While training a logistic regression model, using regularization can help distribute weights and avoid reliance on some particular weight, making the model more robust.

Eg: suppose my input vector is 4 dimensional. The input values are [1,1,1,1]. The output can be 1 if my weight matrix has values [1,0,0,0] or [0.25,0.25,0.25,0.25]. L2 norm would give the later weight matrix (because pow(1, 2) > 4*pow(0.25,2) ). I understand intuitively why l2 regularization can be beneficial here.

But in case of linear regression l2 regularization reduces the slope. Why reducing the slope only provides better performance, is increasing the slope also an alternative?

2 Answers

Using regularization and shrinking the parameters, we reduce the sample variance of the estimates, and reduce the tendency to fit the random noise. The fitting to the noise is something that we wish to reduce. We can't increase the slope as we wan't to reduce overfitting.

L2 doesn’t necessarily reduce the number of features, but rather reduces the magnitude/impact that each features has on the model by reducing the coefficient value.

Shrinking can lead to positive effect when we have overestimated and negative when we have underestimated. But we are not shrinking everyone equally ,we are shifting with a factor that is larger if the estimate is larger away from zero.

Shrinking all the slopes towards zero will make some of them more accurate and some of them less accurate, but you can see how it would make them collectively more accurate.

Answered by prashant0598 on September 5, 2021

Please refer this article on L1 and L2 regularization :- https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261

L1 called as Lasso and L2 called as Ridge essentially reduces the learning process of the gradient descent ( loss reduction) in an attempt to reduce overfitting. As far as I know only L1 has the impact of reducing the coefficents of lesser effective features and not L2.

Answered by vivek on September 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP