# Question about the right inverse method in a GLM of order 2

Cross Validated Asked by Suzee on December 15, 2020

I have taken a course in regression analysis. I learned that the equation $$beta =(X’X)^{-1}X’y$$ can be used to find the weights in a linear model.

When learning about GLMs, I came across this formula that can be used when $$(X’X)$$ is not invertible while $$XX’$$ is.

$$beta = X'(XX’)^{-1}y$$

For example, if the generalized linear model is $$y(x_1,x_2) = b_0 +b_1x_1+b_2x_2+b_3x_1x_2+b_4x_1^2+b_5x_2^2$$, and there are, say, four data points, then there would be correlation between the columns and so we have to use the "right inverse method".

My question is, where did the equation $$beta = X'(XX’)^{-1}y$$ come from? I tried searching up the derivation, but could not find anything (which is why I could not come up with a nice title for this post; I don’t know what keywords to use). I suspect it would be by performing partial derivatives, as in the usual $$beta =(X’X)^{-1}X’y$$ equation, but I cannot see how it works.

Your question is a bit unclear to me. However, as far I can understand...

if the generalized linear model is $$y(x_1,x_2)=b_0+b_1x_1+b_2x_2+b_3x_1x_2+b_4x^2_1+b_5x^2_2$$, and there are, say, four data points, then

...then your model matrix $$X$$ is a $$4times 6$$ matrix: $$begin{bmatrix} 1 & x_{11} & x_{21} & x_{11}x_{21} & x_{11}^2 & x_{21}^2 \ 1 & x_{12} & x_{22} & x_{12}x_{22} & x_{12}^2 & x_{22}^2 \ 1 & x_{13} & x_{23} & x_{13}x_{23} & x_{13}^2 & x_{23}^2 \ 1 & x_{14} & x_{24} & x_{14}x_{24} & x_{14}^2 & x_{24}^2 end{bmatrix}$$

If $$text{rank}(X)=4$$, then $$X^TX$$ is a $$6times 6$$ singular matrix, while $$XX^T$$ is a $$4times 4$$ non singular matrix.

You should need $$(X^TX)^{-1}$$ to estimate $$beta$$, but since $$X^TX$$ is singular you have to use a right inverse, i.e. a matrix $$X_R$$ such that $$XX_R=I$$: $$Xbeta=y,qquad Xbeta=XX_Ry,qquad beta=X_Ry$$ A right inverse that always guarantees a solution is: $$X_R=X^T(XX^T)^{-1}$$. See Wikipedia and Cherkassky & Mulier, Learning From Data: Concepts, Theory, and Methods, John Wiley & Sons, 2007, Appendix B.

Correct answer by Sergio on December 15, 2020

## Related Questions

### Statistics book recommendation for absolute beginners and non-mathematics people

2  Asked on December 18, 2021

### Likelihood as a test statistic in a hypothesis test

0  Asked on December 18, 2021

### How to generate time series with a predefined auto correlation and cross-correlation among the series

0  Asked on December 18, 2021

### When to preprocess data for neural network

1  Asked on December 18, 2021

### Aggregation Estimation Issues

1  Asked on December 18, 2021 by cel

### Can I see Log-likelihood values for two-step clustering in SPSS?

1  Asked on December 18, 2021 by yauheniya-volchok

### Why do people use tanh more often than ReLU in vanilla recurrent neural networks?

0  Asked on December 18, 2021

### Why does forward selection only take $O(p^2)$ calls to the learning algorithm?

1  Asked on December 18, 2021

### Simulate a variable based on a known correlation and distribution

0  Asked on December 15, 2021

### Intuition Behind binomial (logistic) GLM

2  Asked on December 15, 2021

### What does the Hedges g mean in this meta-analysis?

1  Asked on December 15, 2021 by charliecal

### Using Regression Trees for Univariate Time Series Data

2  Asked on December 15, 2021

### too much levels in the categorical variable in a GLM

1  Asked on December 15, 2021

### What classifier could predict spam/ham labels for SMS messages better than Naive Bayes?

1  Asked on December 15, 2021

### ar() in R and Matlab give different results for same dataset

0  Asked on December 15, 2021

### Interpreting PCA results of first two components

1  Asked on December 15, 2021 by paulgr

### p-value to Z-statistic for a KS test

0  Asked on December 15, 2021 by user8188120

### baisc question about fit of GLS in R

1  Asked on December 15, 2021 by user284031

### Probability that automated process is 95% accurate given n sucesses

1  Asked on December 15, 2021 by mbyvcm

### When picking k from a population n, with replacement, how do I determine the optimal n where all of k will be unique?

0  Asked on December 15, 2021 by mikkeywilks