# How to impose restrictions on a random matrix via its prior distribution?

Cross Validated Asked by SOULed_Outt on November 27, 2020

I am reading the paper Factor analysis and outliers: A Bayesian approach. The author starts with a factor analysis model given by
$${bf y}_i = {bf Lambda} {bf z}_i + {bf e}_i, quad i = 1, ldots, n,$$
where each $${bf y}_i$$ is a $$p$$-dimensional observation vector, each $${bf z}_i$$ is a $$K$$-dimensional latent factor vector, and $${bf Lambda}$$ is a $$p times K$$ full-rank matrix of factor loadings. The author assumes that the factors and the error term are Normal:
$${bf z}_i sim mathcal{N} ({bf 0}, {bf Phi})$$
$${bf e}_i sim mathcal{N} ({bf 0}, {bf Psi})$$

The author assigns Wishart priors to $${bf Phi}^{-1}$$ and $${bf Psi}^{-1}$$:
$${bf Phi}^{-1} sim mathcal{W}_K left( {bf Phi}_{*}, nu_{*} right)$$
$${bf Psi}^{-1} sim mathcal{W}_p left( {bf Psi}_{*}, n_{*} right)$$

In the paper the author writes something I found to be quite interesting:

While classical factor analysis sets $$bf Phi = I$$ and uses a diagonal $$bf Psi$$ matrix, we impose these restrictions via the prior information matrices $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$.

Question: What should the values of $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$ be in order to do what the author is suggesting?

The author does not seem to state exactly how this can be done, but I may have missed it so I will continue reading it. My own research on this matter pointed me to these seemingly similar unanswered questions here and here.

UPDATE: I did some research on the Wishart distribution and if you specify that $$Psi_*$$ and $$Phi_*$$ are two diagonal matrices, then $$mathbb{E} [Psi]$$ and $$mathbb{E} [Phi]$$ will be two diagonal mean matrices. Perhaps, this is what the author is referring to. Still unsure, though.

UPDATE 2: I set $$Psi_*$$ and $$Phi_*$$ to diagonal matrices and ran simulations in R, but the results aren’t what I expected. The simulated values I obtained are not diagonal, so I think I misinterpreted the author’s statement. I thought that if you formulate the factor analysis model with the prior distributions above, that you can consider it the classical factor analysis model by choosing certain hyper-parameter value. But it seems that this formulation does not produce the classical factor analysis model.

UPDATE 3: The classical factor analysis model sets $${bf Phi} = {bf I}$$ (i.e. non-random), sets $$bf Psi$$ to be a diagonal matrix (i.e. random diagonal matrix) and assigns prior distributions to only the diagonal elements. What I understand the author’s statement to mean, is that I can do the aforementioned things by using Wishart priors on $$bf Phi$$ and $$bf Psi$$ with special scale matrices $$bf Phi_*$$ and $$bf Psi_*$$.

Inverse Wishart (which is used in the mentioned article) is used as a prior for the covariance matrix of a multivariate Normal distributed random variable.

This choice is based on the fact that its a conjugate prior for the covariance matrix in this scenario.

If $$mathbf{X}=(mathbf{x}_1, mathbf{x}_2, ldots, mathbf{x}_n) sim mathcal{N}(mathbf{0}, mathbf{Sigma})$$, with a prior $$mathbf{Sigma} sim mathcal{W}^{-1}(mathbf{Psi}, nu)$$, then the posterior $$p(mathbf{Sigma}|mathbf{X}) sim mathcal{W}^{-1}(mathbf{A}+mathbf{Psi},n+nu)$$ is also an inverse-Wishart distributed random variable ($$mathbf{A}=mathbf{X}mathbf{X}^t$$, $$n$$=number of observations $$mathbf{X}$$).

Said that, one can impose the structure of the prior for the covariance matrix, by setting the prior scale matrix $$mathbf{Psi}$$ opportunely. In the article, the authors set the $$mathbf{Psi}=mathbf{Psi}^*$$ to be diagonal.

An alternative approach would have been forcing the $$p$$ variables to be independently Normal-distributed. In that case, the conjugate prior for the variance of each dimension would have been the Inverse Gamma.
The limitation of the latter is that forces the posterior $$p$$ variables to be independent, while in the case of an Inverse Wishart, off-diagonal elements of the covariance matrix can have a non-zero-probability to be non-zero.

When setting the scale matrix $$mathbf{Psi}^*$$ as diagonal and $$nu=p+1$$, the correlations in $$mathbf{Sigma}$$ have a marginal uniform distribution (par. 2.1 https://arxiv.org/pdf/1408.4050.pdf). This corresponds to a non-informative prior for the correlations, implying that non-zero correlations require strong evidence from the data $$mathbf{X}$$.

An interesting alternative, suggested by Gelman, is to use Half-Cauchy priors (the linked article focuses on 1-dimensional hierarchical models):

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

Correct answer by ping on November 27, 2020

## Related Questions

### Computing GLM Relativities from Spline Regression

1  Asked on September 8, 2020 by jordan

### Comparing a Bayesian model with a Classical model for linear regression

1  Asked on September 7, 2020 by student_r123

### Tuning parameters of SVM in tune function

1  Asked on September 5, 2020 by siegfried

### Interpreting growth curve analysis (GCA) main effect in light of interaction (eye tracking data)

1  Asked on September 4, 2020 by meg

### ElasticNet coefficients are different for each cv.glmnet run

0  Asked on September 4, 2020 by jonathan

### Tensor Classification Models

1  Asked on September 3, 2020 by mamafoku

### Simulation of Secretary problem: optimal pool size given k=2?

1  Asked on August 30, 2020 by engrstudent

### Comparing more than two means of continuous variables

2  Asked on August 28, 2020 by kapetantuka

### For B-Spline why $n+1 > k ge 2$ and why is $t_{k-1} le t le t_{n+1}$

0  Asked on August 27, 2020 by user8714896

### Standard deviation and confidence level: how to interpret and evaluate the results

2  Asked on August 25, 2020 by andrea-moro

### Specifying several independent priors in stan_glm() in R

0  Asked on August 23, 2020 by marg

### How do I compare cv.glmnet models with AIC?

1  Asked on August 20, 2020 by thomas

### Maximum likelihood estimator for a discontinuous PDF

0  Asked on August 17, 2020 by probdiscr

### Difference between Linear Mixed Regression and Generalized Estimating Equation Results

1  Asked on August 13, 2020 by rnso

### ‘Translate’ ANOVA comparison on regression parameters into linear mixed model

1  Asked on August 13, 2020 by laurie

### Uncertainty propagation for the solution of an integral equation

0  Asked on August 12, 2020 by clment-f

### Which test should I use to compare 2 unrelated dichotomous variables?

1  Asked on August 10, 2020 by anna

### Difference in Differences with Multiple Time Periods and Multiple Treatment Periods

1  Asked on August 8, 2020 by john-baker

### ARDL and ECM lags

0  Asked on August 8, 2020 by php-useless

### Combining categorical and continuous features for neural networks

2  Asked on August 5, 2020 by 3michelin