Cross Validated Asked by P Lrc on January 3, 2022

I’m trying to understand Bayesian statistics. Recently I asked here whether we estimate paramteres of a priori distribution in bayesian statistics. I was responded that we typically don’t estimate them unless we’re using Empirical Bayes and because we’re going to "update" a priori distribution anyway.

In wikipedia I’ve read

Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

I thought that maybe we assume some a priori distribution, get our observations, calculate a posteriori distribution, treat it as our a priori distribution and we repeat this procedure untill convergence.

Unfortunately I’ve realised that this doesn’t make sense since for example for Poisson-Gamma with a priori with parameters $gamma, beta$ the a posteriori is again a gamma distribution with parameters

$$ gamma’=gamma+sum_{j=1}^n X_j$$

$$beta’=beta+n$$

and such parameters connot be "convergent". So:

(a) why we don’t need to bother ourselves with the exact form of a priori distribution in pure bayesian statistics?

(b) how do we "update" a priori distribution?

(c) what exactly the sequential estimation means?

So a couple things to clarify:

- Posterior Distribution: This typically represents the information that the model entity has about the system before looking at the data expressed in probabalistic terms. There are many schools of thought on how one should do this exactly and it is context dependent.

For concreteness, suppose we are medical researchers trying to evaluate the effectiveness of a treatment ($A$) on some (continuous) quality of life measure ($Y$), controlling for a vector of baseline covariates ($X$).

Suppose we model the data generating likelihood as normal:

Y|A,X ~ $N(alpha_y + Abeta_a + Xbeta_x, sigma)$

Now our priors are the joint distribution of the parameters, $p(alpha_y, beta_a, beta_x, sigma)$ which we can specify however we want to represent what we know. In a medical context we might be able to bring in information from other studies or theoretical knowledge about how the control variables might impact the outcome. Or we could express some notion of ignorance with these priors.

Sometimes we might write down a family of distributions that represent the priors, but we are unsure how to parametrize those priors. This is where we have the options to estimate those hyper-parameters with methods like empirical bayes or we can specify a hyper-prior distribution for these parameters. In either case, they are just part of the prior and how we are expressing the information and ignorance that we have prior to looking at the data. So to answer question (a), we do need to worry about the prior and it's form. The prior will impact our inferences and decisions later on, but there are different approaches to how you do that exactly. Some approaches (Jaynes and the "objective school": https://bayes.wustl.edu/etj/articles/prior.pdf, Priors in the context of the likelihood:https://arxiv.org/pdf/1708.07487.pdf, the afforementioned Empirical Bayes approach, and many more). The prior is a big part of making a Bayesian model

Now we get to the updating. Often finding the posterior is referred to as updating. If we let $theta = (alpha_y, beta_a, beta_x, sigma)$ be the vector of parameters, the posterior is:

$p(theta|A,X) propto p(A,X|theta) p(theta)$, where $p(A,X|theta)$ is the normal likelihood above and $p(theta)$ is the prior.

The way to think of this update is in terms of information. The prior is the information or ignorance before and the posterior is now the best representation of our knowledge of the parameters combining what we knew before and what the data through the likelihood is telling us, representing the current state of all of our knowledge in the form of a probability distribution. (In decision theoretic approaches to bayesian probability, this can be formalized as in some sense an optimal updating of the prior information taking in the evidence from the data, See Bernardo and Smith (1994) for example).

The posterior is the update.

However, I think I see where your possible confusion. When do we stop, I think you are asking. The answer is we update whenever we get new information (typically this means data).

So say we conduct our experiment on the treatment $(A)$ and we get our posterior. We could potentially run another study. Ideally, how this would work is that the posterior from the first study is now our prior for the second study since it represents everything we know about the parameters before incorporating the knowledge from this second experiment. This kind of thing happens all the time in industry where data or information might come in batches and then you might get iterative updating of our knowledge and thus posterior. I believe this is what they mean by sequential estimates. The key is the updates have to occur with more information.

They also talk about how the posterior becomes complex and numerical methods in the case that the priors are not conjugate. In the real world this is usually the case, our information is not always conveniently represented by a conjugate family. Then to estimate the posterior we have to rely on numerical methods. This can get very complicated in sequential analyses and may require approximations in order to pass on the information from one experiment to the next when the posterior is not closed form or easy to sample from.

Answered by Tyrel Stokes on January 3, 2022

0 Asked on November 30, 2020 by mathias-schinnerup-hejberg

1 Asked on November 30, 2020 by shiladitya-basu

1 Asked on November 29, 2020 by statian

conditional probability gaussian process multivariate normal

2 Asked on November 29, 2020

1 Asked on November 29, 2020 by javier-tg

machine learning maximum likelihood normal distribution optimization

0 Asked on November 29, 2020

0 Asked on November 29, 2020 by theundecided

1 Asked on November 28, 2020 by john

1 Asked on November 28, 2020 by xeon123

0 Asked on November 28, 2020 by ystein-dunker

accuracy interpretation medicine ratio sensitivity specificity

2 Asked on November 28, 2020 by durin

0 Asked on November 27, 2020 by tranquil-coder

generalized linear model maximum likelihood sufficient statistics

1 Asked on November 27, 2020 by souled_outt

1 Asked on November 27, 2020 by user3285148

0 Asked on November 26, 2020 by duke-yue

0 Asked on November 26, 2020 by etang

1 Asked on November 26, 2020 by chris-beeley

0 Asked on November 26, 2020 by jld

distributions extreme value normal distribution order statistics probability

1 Asked on November 25, 2020 by asra-khalid

1 Asked on November 25, 2020 by dkent

Get help from others!

Recent Answers

- Peter Machado on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir