TransWikia.com

Why L2 loss is more commonly used in Neural Networks than other loss functions?

Artificial Intelligence Asked by Ali KHalili on September 27, 2020

Why L2 loss is more commonly used in Neural Networks than other loss functions?
What is the reason to L2 being a default choice in Neural Networks?

One Answer

I'll cover both L2 regularized loss, as well as Mean-Squared Error (MSE):

MSE:

  1. L2 loss is continuously-differentiable across any domain, unlike L1 loss. This makes training more stable and allows for gradient-based optimization, as opposed to combinatorial optimization.
  2. Using L2 loss (without any regularization) corresponds to the Ordinary Least Squares Estimator, which, if you're able to invoke Gauss-Markov assumptions, can lead to some beneficial theoretical guarantees about your estimator/model (e.g. that it is the "Best Linear Unbiased Estimator"). Source: https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem.

L2 Regularization:

  1. Using L2 regularization is equivalent to invoking a Gaussian prior (see https://stats.stackexchange.com/questions/163388/why-is-the-l2-regularization-equivalent-to-gaussian-prior) on your model/estimator. If modeling your problem as a Maximum A Posteriori Inference (MAP) problem, if your likelihood model (p(y|x)) is Gaussian, then your posterior distribution over parameters (p(x|y)) will also be Gaussian. From Wikipedia: "If the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian" (source: https://en.wikipedia.org/wiki/Conjugate_prior).

  2. As in the case above, L2 loss is continuously-differentiable across any domain, unlike L1 loss.

Correct answer by Ryan Sander on September 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP