Cross Validated Asked on January 3, 2022

For some context, I shall outline my current understanding:

Considering a Neural Network, for a Binary Classification problem, the Cross-entropy cost function, J, is defined as:

$ J = frac{-1}{m} sum_{i=1}^m y^i*log(a^i) + (1-y^i)*log(1-a^i) $

- m = number of training examples
- y = class label (0 or 1)
- a = output prediction (value between 0 and 1)

Dropout regularisation works as follows: For a given training example, we randomly shut down some nodes in a layer according to some probability. This has the effect of keeping the weights low during training and hence regularises the network and prevents overfitting.

I have learnt that if we do apply dropout regularisation, the cross entropy cost function is no longer easy to define due to all the intermediate probabilities. Why is this the case? Why doesn’t the old definition still hold? As long as the network learns better parameters, won’t the cross entropy cost decrease on every iteration of Gradient Descent? Thanks in advance.

Dropout does not change the cost function, and you do not need to make changes to the cost function when using dropout.

The reasoning is that dropout is a way to average over an ensemble of each of the exponentially-many "thinned" networks resulting from dropping units randomly. In this light, each time you apply dropout and compute the loss, you're computing the loss that corresponds to a randomly-selected thinned network; collecting together many of these losses reflects a distribution of losses over these networks. Of course, the loss surface is noisier as a result, so model training takes longer. The goal of training the network in this way is to obtain a model that is averaged over all of these different "thinned" networks.

For more information, see How to explain dropout regularization in simple terms? or the original paper: Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", *Journal of Machine Learning Research*, 2014.

Answered by Sycorax on January 3, 2022

1 Asked on February 14, 2021 by pol

chi squared distribution distributions non central probability

1 Asked on February 13, 2021

0 Asked on February 13, 2021 by comte

feature engineering feature selection neural networks time series

0 Asked on February 13, 2021 by rando

0 Asked on February 12, 2021 by itsallpurple

matrix decomposition moment generating function multivariate normal distribution normal distribution quadratic form

0 Asked on February 12, 2021 by user294496

2 Asked on February 11, 2021 by desperate-about-statistics

1 Asked on February 11, 2021 by user3676846

0 Asked on February 11, 2021 by iterator516

continuous data distributions expected value integral probability

2 Asked on February 10, 2021 by d-b

0 Asked on February 10, 2021 by lisa-ann

1 Asked on February 10, 2021 by user136083

2 Asked on February 9, 2021 by naveen-y

1 Asked on February 9, 2021 by mads-lumholdt

agreement statistics anova bland altman plot interpretation r

2 Asked on February 9, 2021 by user26067

1 Asked on February 8, 2021 by mishe-mitasek

0 Asked on February 8, 2021 by amit-s

1 Asked on February 6, 2021 by rkabra

0 Asked on February 6, 2021 by pst0102

Get help from others!

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

Recent Answers

- Lex on Does Google Analytics track 404 page responses as valid page views?
- Jon Church on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP