Artificial Intelligence Asked by DuttaA on November 7, 2020
I came across the concept of Bayesian Occam Razor in the book Machine Learning: a Probabilistic Perspective. According to the book:
Another way to understand
the Bayesian Occam’s razor effect is to note that probabilities must
sum to one. Hence $sum_D’ p(D’ |m) = 1$, where the sum is over all possible data sets. Complex
models, which can predict many things, must spread their probability mass thinly, and hence
will not obtain as large a probability for any given data set as simpler models. This is sometimes called the conservation of probability mass principle.
The figure below is used to explain the concept:
Image Explanation: On the vertical axis we plot the predictions of 3 possible models: a simple one, $M_1$ ; a medium one, $M_2$ ; and a complex one, $M_3$ . We also indicate the actually observed
data $D_0$ by a vertical line. Model 1 is too simple and assigns low probability to $D_0$ . Model 3
also assigns $D_0$ relatively low probability, because it can predict many data sets, and hence it
spreads its probability quite widely and thinly. Model 2 is “just right”: it predicts the observed data with a reasonable degree of confidence, but does not predict too many other things. Hence model 2 is the most probable model.
What I do not understand is when a complex model is used, it will likely overfit data and hence the plot for a complex model will look like a bell shaped with its peak at $D_0$ while simpler models will more likely have a broader bell shape. But the graph here shows something else entirely. What am I missing here?
The original graph for the aforementioned Bayesian Optimisation is similar to the graph in these slides (slide 18) along with the calculations.
So, according to the tutorial the graph shown should actually have the term $p(D|m)$ on the y-axis, thus making it a generative model.Now the graph starts to make sense, since a model with low complexity cannot produce very complex datasets and will be centred around 0, while very complex models can produce richer datasets which makes them assign probability thinly over all the datatsets (to keep $sum_{D'}p(D'|m) = 1$).
Answered by DuttaA on November 7, 2020
0 Asked on December 16, 2021
0 Asked on December 16, 2021 by sirfroggy
bellman equations convergence proofs q learning reinforcement learning
1 Asked on December 13, 2021
convergence deep rl dqn function approximation reinforcement learning
1 Asked on December 13, 2021
3 Asked on December 11, 2021
categorical crossentropy comparison machine learning mean squared error objective functions
1 Asked on December 11, 2021
attention deep learning natural language processing transformer
1 Asked on December 11, 2021 by sports_stats
algorithm cross validation machine learning support vector machine
2 Asked on December 9, 2021
0 Asked on December 9, 2021 by blba
algorithmic bias neural networks reference request regularization training
0 Asked on December 9, 2021 by i_al-thamary
1 Asked on December 7, 2021
1 Asked on December 7, 2021 by jr123456jr987654321
activation function convolutional neural networks neural networks relu residual networks
1 Asked on December 7, 2021
data preprocessing machine learning neural networks normalisation standardisation
1 Asked on December 4, 2021 by nilsinelabore
convolutional neural networks keras machine learning neural networks
3 Asked on December 2, 2021 by pnar-demetci
deep learning deep neural networks feedforward neural network machine learning neural networks
2 Asked on December 2, 2021
geometric deep learning graph neural networks resource request
2 Asked on November 29, 2021 by ipsumpanest
6 Asked on November 29, 2021
bayesian deep learning convolutional neural networks keras tensorflow uncertainty quantification
1 Asked on November 29, 2021 by sanmu
architecture convolutional neural networks deep learning neural networks
1 Asked on November 29, 2021 by unter_983
actor critic methods ddpg policy gradients reinforcement learning
Get help from others!
Recent Answers
© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP