TransWikia.com

How the multiplication of observations numbers contributes to Bayesian assumption in BIC calculation?

Cross Validated Asked by Eddie S on January 30, 2021

The model selection process applies both AIC and BIC in various situations:

$$operatorname{AIC} = -2operatorname{ln}(operatorname{likelihood}) + 2k$$
and

$$operatorname{BIC} = -2operatorname{ln}(operatorname{likelihood}) + operatorname{ln}(N)k$$

$k$ = model degrees of freedom
$N$ = number of observations

The only difference I could see in the estimations of AIC and BIC is $operatorname{ln}(N)$. Could anyone please clarify how a multiplication of $(operatorname{ln}(N))$ contributes to Bayesian assumption? If not, why this is called as Bayesian IC?

Many Thanks

One Answer

All information criteria are Bayesian. The AIC and the BIC are both Bayesian. You could also argue, that all Bayesian methods are an offshoot of information theory. For example, you can directly derive the K-L divergence from the Bayesian predictive distribution and the distribution in nature. There is no logical reason you cannot go in the opposite direction.

What differs among the information criterion is the implicit prior over the model space. The BIC imposes a uniform distribution over the model space. The AIC and the BIC converge under some relatively common conditions in that the difference in the prior is so small that it doesn't matter in any computational sense. That does not mean they get the same number value. It does mean that they rank all outcomes in the same order.

The implicit prior for the AIC is $$Pr(M_i)=frac{exp(frac{1}{2}k_ilog(n)−k_i)}{sum^I_iexp(frac{1}{2}k_ilog(n)−k_i)},$$ where $M_i$ is the $i$th model. The implicit prior for the BIC, given $I$ models is $$Pr(M_i)=I^{-1}.$$

If one were to approach this as a Bayesian problem and ignore information theory, then the $log(N)$ comes from the equal prior weighting of the models.

The BIC assumes that you lack any information that one model is any better than another. The AIC assumes that you do have prior information, not so much about the specific model, but about complexity. The AIC imposes a prior assumption that simpler models are more probably the true model than complex models.

The BIC gets its name from the thought process used to derive it. Don't get hung up on the name.

The strange result is that by imposing a flat prior, the BIC is a biased estimator. By imposing an informative prior, the AIC is an unbiased estimator. That result is contrary to standard thinking about what is required to get an unbiased estimator.

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. 2d ed. New York: Springer-Verlag.

Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods and Research, 33(2), 261–304.

Answered by Dave Harris on January 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP