Exploding probability under simple hierarchical Bayesian formulation

Question

I am wondering if someone here can clear up a point of confusion that I have when applying MCMC or an optimization method to hierarchical Bayesian problems. Let's say we have a likelihood and prior with the following form:
$N(x_{i, j}|v_j, sigma^2)N(v_j|mu_v, sigma_v^2)N(mu_v|0, 1)p(sigma_v)p(sigma)$
Where $N$ represents a normal distribution, while $p()$ is some other unspecified distribution (e.g. half-normal). The subscript $i$ represents some repeated trial, while the subscript $j$ represents some grouping variable. For example $x_{i,j}$ is a button press time, where $i$ represents a single response for a subject $j$.
It seems to me that there are two ways to go to maximize the sum log probability here:

One is that $x$ constrains the individual $v_j$, which in turn constrains $mu_v$ and $sigma_v$ to their appropriate values.
Another is that $sigma_v$ collapses to 0, (all) $v_j$ collapse to $mu_v$, and sigma expands to cater to $x$. In this case, the log probability of $N(v_j|mu_v, sigma_v^2)$ explodes since all the $v_j$ are perfectly explained by $v_j = mu_v$ and $sigma_v=0$.

The solution that would be typically wanted is 1, but because of the 'exploding' nature of 2 I don't see how to avoid it in this formulation (without making strong assumptions about the priors). So I am wondering if the problem definition is simply not constrained enough or if I have a conceptual misunderstanding, or something else -
Thank you -

LmnICE · Answer

Whether 1. or 2. happen depends on the specifics of the priors $p(v_j)$ and $p(sigma)$.
In general, the Bayesian approach "shrinks" the specified likelihoods towards the priors. If the priors are "loose" (i.e. they assign non-negligible probability to a wide range of values), then the shrinkage is low and the likelihood dominates the posterior. If your likelihoods are also "loose", then the algorithms used to obtain the posterior through simulation (such as MCMC, HMC, etc) have a hard time.  In your case, 2. happens.
If on the other hand the priors are more specific, then the posterior is a mix between the likelihood and the prior. In that case, 1. happens.
To be precise, both 1. and 2. will happen, to the extent that the priors are more or less specific. The tighter they are, the more pronounced the effect of 1. on the posterior. Otherwise, the effect of 2. will dominate the posterior.
Response to comments
It seems as though you have little in the way of prior information about the free variables in your model (i.e. $sigma _v$ and $sigma$). Possible solutions are:

Refactor your model so it depends on free variables about which you have a better intuition or more consistent prior information. For instance, refactor your model so the free variable is a general mean response time; in that case, you can set your priors based on a meta analysis, for instance; or
Run a sensibility analysis to understand how much the posterior is influenced by the center of the probability mass on your priors. For instance, you can set up three cenarios:

$sigma _v$ centered around $10 ^{-3}$
$sigma _v$ centered around $10 ^{-2}$
$sigma _v$ centered aroung $1$

and see if they affect the posteriors too much.

Exploding probability under simple hierarchical Bayesian formulation

One Answer

Response to comments

Add your own answers!

Ask a Question