Cross Validated Asked by Suzee on December 15, 2020

I have taken a course in regression analysis. I learned that the equation $beta =(X’X)^{-1}X’y$ can be used to find the weights in a linear model.

When learning about GLMs, I came across this formula that can be used when $(X’X)$ is not invertible while $XX’$ is.

$$beta = X'(XX’)^{-1}y$$

For example, if the generalized linear model is $y(x_1,x_2) = b_0 +b_1x_1+b_2x_2+b_3x_1x_2+b_4x_1^2+b_5x_2^2$, and there are, say, four data points, then there would be correlation between the columns and so we have to use the "right inverse method".

My question is, where did the equation $beta = X'(XX’)^{-1}y$ come from? I tried searching up the derivation, but could not find anything (which is why I could not come up with a nice title for this post; I don’t know what keywords to use). I suspect it would be by performing partial derivatives, as in the usual $beta =(X’X)^{-1}X’y$ equation, but I cannot see how it works.

Your question is a bit unclear to me. However, as far I can understand...

if the generalized linear model is $y(x_1,x_2)=b_0+b_1x_1+b_2x_2+b_3x_1x_2+b_4x^2_1+b_5x^2_2$, and there are, say, four data points, then

...then your model matrix $X$ is a $4times 6$ matrix: $$begin{bmatrix} 1 & x_{11} & x_{21} & x_{11}x_{21} & x_{11}^2 & x_{21}^2 \ 1 & x_{12} & x_{22} & x_{12}x_{22} & x_{12}^2 & x_{22}^2 \ 1 & x_{13} & x_{23} & x_{13}x_{23} & x_{13}^2 & x_{23}^2 \ 1 & x_{14} & x_{24} & x_{14}x_{24} & x_{14}^2 & x_{24}^2 end{bmatrix}$$

If $text{rank}(X)=4$, then $X^TX$ is a $6times 6$ singular matrix, while $XX^T$ is a $4times 4$ non singular matrix.

You should need $(X^TX)^{-1}$ to estimate $beta$, but since $X^TX$ is singular you have to use a *right inverse*, i.e. a matrix $X_R$ such that $XX_R=I$:
$$ Xbeta=y,qquad Xbeta=XX_Ry,qquad beta=X_Ry$$
A right inverse that always guarantees a solution is: $X_R=X^T(XX^T)^{-1}$.
See Wikipedia and Cherkassky & Mulier, *Learning From Data: Concepts, Theory, and Methods*, John Wiley & Sons, 2007, Appendix B.

Correct answer by Sergio on December 15, 2020

2 Asked on December 18, 2021

0 Asked on December 18, 2021

bootstrap hypothesis testing likelihood permutation test two sample

0 Asked on December 18, 2021

1 Asked on December 18, 2021

conv neural network cross validation data preprocessing keras neural networks

1 Asked on December 18, 2021 by cel

1 Asked on December 18, 2021 by yauheniya-volchok

0 Asked on December 18, 2021

1 Asked on December 18, 2021

0 Asked on December 15, 2021

2 Asked on December 15, 2021

binomial distribution generalized linear model logistic normal distribution r

1 Asked on December 15, 2021 by charliecal

2 Asked on December 15, 2021

1 Asked on December 15, 2021

1 Asked on December 15, 2021

classification machine learning naive bayes natural language supervised learning

0 Asked on December 15, 2021

1 Asked on December 15, 2021 by paulgr

0 Asked on December 15, 2021 by user8188120

1 Asked on December 15, 2021 by user284031

1 Asked on December 15, 2021 by mbyvcm

0 Asked on December 15, 2021 by mikkeywilks

Get help from others!

Recent Answers

- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- haakon.io on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP