TransWikia.com

Which is a better form of regularization: lasso (L1) or ridge (L2)?

Artificial Intelligence Asked by jaeger6 on December 27, 2021

Given a ridge and a lasso regularizer, which one should be chosen for better performance?

An intuitive graphical explanation (intersection of the elliptical contours of the loss function with the region of constraints) would be helpful.

One Answer

The following graph shows the constraint region (green), along with contours for Residual sum of squares (red ellipse). These are iso-lines signifying that points on an ellipse have the same RSS. enter image description here Figure: Lasso (left) and Ridge (right) Constraints [Source: Elements of Statistical Learning]

As Ridge regression has a circular constraint ($beta_1^2 + beta_2^2 <= d$) with no edges, the intersection will not occur on an axis, signifying that the ridge regression parameters will usually be non-zero.

On the contrary, the Lasso constraint ($|beta_1| + |beta_2| <= d$) has corners at each of the axes, and so the ellipse will often intersect the constraint region at an axis. In 2D, such a scenario would result in one of the parameters to become zero whereas in higher dimensions, more of the parameter estimates may simultaneously reach zero.

This is a disadvantage of ridge regression, wherein the least important predictors never get eliminated, resulting in the final model to contain all predictor variables. For Lasso, the L1 penalty forces some parameters to be exactly equal to zero when $lambda$ is large. This has a dimensionality reduction effect resulting in sparse models.

In cases where the number of predictors are small, L2 could be chosen over L1 as it constraints the coefficient norm retaining all predictor variables.

Answered by s_bh on December 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP