# How to impose restrictions on a random matrix via its prior distribution?

Cross Validated Asked by SOULed_Outt on November 27, 2020

I am reading the paper Factor analysis and outliers: A Bayesian approach. The author starts with a factor analysis model given by
$${bf y}_i = {bf Lambda} {bf z}_i + {bf e}_i, quad i = 1, ldots, n,$$
where each $${bf y}_i$$ is a $$p$$-dimensional observation vector, each $${bf z}_i$$ is a $$K$$-dimensional latent factor vector, and $${bf Lambda}$$ is a $$p times K$$ full-rank matrix of factor loadings. The author assumes that the factors and the error term are Normal:
$${bf z}_i sim mathcal{N} ({bf 0}, {bf Phi})$$
$${bf e}_i sim mathcal{N} ({bf 0}, {bf Psi})$$

The author assigns Wishart priors to $${bf Phi}^{-1}$$ and $${bf Psi}^{-1}$$:
$${bf Phi}^{-1} sim mathcal{W}_K left( {bf Phi}_{*}, nu_{*} right)$$
$${bf Psi}^{-1} sim mathcal{W}_p left( {bf Psi}_{*}, n_{*} right)$$

In the paper the author writes something I found to be quite interesting:

While classical factor analysis sets $$bf Phi = I$$ and uses a diagonal $$bf Psi$$ matrix, we impose these restrictions via the prior information matrices $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$.

Question: What should the values of $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$ be in order to do what the author is suggesting?

The author does not seem to state exactly how this can be done, but I may have missed it so I will continue reading it. My own research on this matter pointed me to these seemingly similar unanswered questions here and here.

UPDATE: I did some research on the Wishart distribution and if you specify that $$Psi_*$$ and $$Phi_*$$ are two diagonal matrices, then $$mathbb{E} [Psi]$$ and $$mathbb{E} [Phi]$$ will be two diagonal mean matrices. Perhaps, this is what the author is referring to. Still unsure, though.

UPDATE 2: I set $$Psi_*$$ and $$Phi_*$$ to diagonal matrices and ran simulations in R, but the results aren’t what I expected. The simulated values I obtained are not diagonal, so I think I misinterpreted the author’s statement. I thought that if you formulate the factor analysis model with the prior distributions above, that you can consider it the classical factor analysis model by choosing certain hyper-parameter value. But it seems that this formulation does not produce the classical factor analysis model.

UPDATE 3: The classical factor analysis model sets $${bf Phi} = {bf I}$$ (i.e. non-random), sets $$bf Psi$$ to be a diagonal matrix (i.e. random diagonal matrix) and assigns prior distributions to only the diagonal elements. What I understand the author’s statement to mean, is that I can do the aforementioned things by using Wishart priors on $$bf Phi$$ and $$bf Psi$$ with special scale matrices $$bf Phi_*$$ and $$bf Psi_*$$.

Inverse Wishart (which is used in the mentioned article) is used as a prior for the covariance matrix of a multivariate Normal distributed random variable.

This choice is based on the fact that its a conjugate prior for the covariance matrix in this scenario.

If $$mathbf{X}=(mathbf{x}_1, mathbf{x}_2, ldots, mathbf{x}_n) sim mathcal{N}(mathbf{0}, mathbf{Sigma})$$, with a prior $$mathbf{Sigma} sim mathcal{W}^{-1}(mathbf{Psi}, nu)$$, then the posterior $$p(mathbf{Sigma}|mathbf{X}) sim mathcal{W}^{-1}(mathbf{A}+mathbf{Psi},n+nu)$$ is also an inverse-Wishart distributed random variable ($$mathbf{A}=mathbf{X}mathbf{X}^t$$, $$n$$=number of observations $$mathbf{X}$$).

Said that, one can impose the structure of the prior for the covariance matrix, by setting the prior scale matrix $$mathbf{Psi}$$ opportunely. In the article, the authors set the $$mathbf{Psi}=mathbf{Psi}^*$$ to be diagonal.

An alternative approach would have been forcing the $$p$$ variables to be independently Normal-distributed. In that case, the conjugate prior for the variance of each dimension would have been the Inverse Gamma.
The limitation of the latter is that forces the posterior $$p$$ variables to be independent, while in the case of an Inverse Wishart, off-diagonal elements of the covariance matrix can have a non-zero-probability to be non-zero.

When setting the scale matrix $$mathbf{Psi}^*$$ as diagonal and $$nu=p+1$$, the correlations in $$mathbf{Sigma}$$ have a marginal uniform distribution (par. 2.1 https://arxiv.org/pdf/1408.4050.pdf). This corresponds to a non-informative prior for the correlations, implying that non-zero correlations require strong evidence from the data $$mathbf{X}$$.

An interesting alternative, suggested by Gelman, is to use Half-Cauchy priors (the linked article focuses on 1-dimensional hierarchical models):

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

Correct answer by ping on November 27, 2020

## Related Questions

### Quantify whether a set of binary segmentation models (experts) have diversity on a fixed dataset?

1  Asked on December 30, 2020 by saeed

### Weighted normal errors regression with censoring

1  Asked on December 29, 2020 by paul-m

### Standard Error or Standard Deviation for error associated with averaging raster values within a polygon?

0  Asked on December 28, 2020 by jbukoski

### Is it possible to interchange the quantile operator and a measurable monotone function? $Q_theta(f(X)) = f(Q_theta(X))$

3  Asked on December 27, 2020

### ROC Curve for data sets with large negative bias

0  Asked on December 27, 2020 by malek

### How to apply distance-based clustering or dimensionality reduction for too many samples

1  Asked on December 27, 2020 by matin-kh

### Understanding multiple regression coefficients and calculations

2  Asked on December 27, 2020 by p34y2

### Arima model giving high forecast values

1  Asked on December 27, 2020

### How to numerically solve for a variant of the weighted least squares

0  Asked on December 27, 2020 by namelessgods

### Changing representation in deep neural network

1  Asked on December 26, 2020

### CNN: Details of Zeiler Fergus Net

1  Asked on December 26, 2020 by vrx

### Chi-squared test, Poisson distribution, type I error overestimated – well-suited test for discrete distributions?

0  Asked on December 25, 2020 by slava-kohut

### Interpreting hamming loss for multilabel classification

1  Asked on December 24, 2020

### Books on using SAS to analyze market risk

1  Asked on December 24, 2020 by victor

### Is the product of conditional posterior equal to the joint distribution?

0  Asked on December 24, 2020 by calveeen

### Probability of intersection involving a continuum

0  Asked on December 24, 2020 by kacem-abd-el-aziz

### same cdf equals same expectation?

1  Asked on December 24, 2020 by natalia

### How to best represent missing count data?

0  Asked on December 23, 2020 by meilton

### Can we estimate the mean of an asymmetric distribution in an unbiased and robust manner?

2  Asked on December 21, 2020