TransWikia.com

Laplace approximation in high-dimensions

Cross Validated Asked by Dionysis M on December 20, 2021

Obviously computing the inverse Hessian is hard when a probability distribution is fitted on high-dimensional datapoints. One idea to reduce computational cost would be to approximate the distribution with a diagonal Gaussian. Are there any standard efficient techniques allowing to make efficient use of the Laplace approximation in high-dimensions?

2 Answers

Are there any standard efficient techniques allowing to make efficient use of the Laplace approximation in high-dimensions?

Assuming that you are using a Laplace approximation for a random effect model, often groups of outcomes are conditional independent given the random effects. These model have a Hessian which sparse making it easy to work with high-dimensional problems. As an example, consider the following mixed logistic regression

$$ begin{align} y_{ij} &sim text{Bin}(text{logit}(b_i + u_i), 1) & j &= 1,dots 5 \ u_i &sim N(0, sigma^2_u) & i &= 1, dots, k \ b_i &sim N(0, sigma^2_b) & i &= 1, dots, k end{align} $$

We can simulate from this model with $k = 3$ and plot the Hessian as follow

# simulate data
n <- 5
k <- 3
sig <- 1
set.seed(1)
dat <- do.call(rbind, lapply(1:k, function(i){
  u <- rnorm(2, sd = sig)
  x <- runif(n, -1, 1)
  y <- 1/(1 + exp(-u[1] - u[2] * x)) < runif(n)
  data.frame(y = y, x = x, grp = i)
}))

# approximate the hessian 
library(numDeriv)
ll <- function(u){
  i_start <- (dat$grp - 1) * 2
  p_hat <- 1/(1 + exp(-u[i_start + 1] - dat$x * u[i_start + 2]))
  sum(log(ifelse(dat$y > 0, p_hat, 1 - p_hat))) +
    sum(dnorm(u, sd = sig, log = TRUE))
}
he <- hessian(ll, numeric(2 * k))
he[abs(he) < 1e-3] <- 0 # threshold just as an example here
print(he, digits = 2)
#R>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
#R> [1,] -2.25 -0.42  0.00  0.00  0.00  0.00
#R> [2,] -0.42 -1.49  0.00  0.00  0.00  0.00
#R> [3,]  0.00  0.00 -2.25 -0.23  0.00  0.00
#R> [4,]  0.00  0.00 -0.23 -1.39  0.00  0.00
#R> [5,]  0.00  0.00  0.00  0.00 -2.25 -0.23
#R> [6,]  0.00  0.00  0.00  0.00 -0.23 -1.32

For this model, the Hessian is just a diagonal block matrix with 2x2 block matrices. Thus, it is easy to invert and to compute the determinant of. The number of non-zero entries of the Hessian is only $2^2k$ rather than $k^2$. This is the basis of much software like the lme4 package and the TMB package.

One idea to reduce computational cost would be to approximate the distribution with a diagonal Gaussian.

There are of course settings where the Hessian is not sparse. These include some Gaussian processes and autoencoders. Assuming a diagonal covariance matrix is a kind to the gaussian variational approximations used in many machine learning applications. However, I suspect that one might opt for some low rank approximation of the determinant instead.

Answered by Benjamin Christoffersen on December 20, 2021

You may want to look into the SR1 and BFGS (and its dual, DFP) optimization methods. Those iteratively update the approximated inverse Hessian using rank 1 (SR1) and rank 2 (BFGS) updates. They are also fairly efficient and can likely handle your problem.

Answered by kurtosis on December 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP