TransWikia.com

Pseudo-inverse matrix for multivariate linear regression

Cross Validated Asked by SomethingSomething on January 12, 2021

In Andrew Ng’s Machine Learning course lecture 4.6 on "Normal Equation", he says that in order to minimize $J(theta) = frac{1}{2m}sumlimits_{i=1}^{m}({h_{theta}}(x^{(i)}) – y^{(i)})^2$, where $h_{theta}(x) = theta_{0} + theta_{1}x_1 + theta_{2}x_2 + … + theta_{n}x_n$ and solve for $theta$, one should take the design matrix $X$ and compute the following expression:

$theta = (X^{T}X)^{-1}X^{T}y$ ,

where the design matrix is the matrix of all feature vectors $[1, x^{(i)}_{1}, x^{(i)}_{2}, …, x^{(i)}_{m}]$ as rows. He shows the Octave (Matlab) code for computing it, as pinv(x'*x)*x'*y.

However, long time ago, when I used Numpy to solve the same problem, I just used np.linalg.pinv(x) @ y. It is even stated in Numpy’s pinv docs that pinv solves the least squares problem for $Ax=b$, such that $overline{x} = A^{+}b$.

So why should I compute $theta = (X^{T}X)^{-1}X^{T}y$ when I can just compute $theta = X^{-1}y$ ? Is there any difference?


EDIT:

Actually, it is easy to see that $theta = (X^{T}X)^{-1}X^{T}y$ is right, because by definition,

$(X^{T}X)^{-1}(X^{T}X) = I$,

but thanks to the associative property of matrix multiplication, we can write the same equation as

$((X^{T}X)^{-1})X^{T})X = I$,

such that we get that by multiplying $X$ from the left by $(X^{T}X)^{-1})X^{T}$, we get $I$, meaning that it is its left-inverse matrix. The left-inverse is the matrix that is used for solving the least-squares problem, as multiplying both sides by it from the left turns $Xtheta=y$ into $Itheta=(X^TX)^{-1}X^Ty$, meaning that the coefficients are $theta = (X^TX)^{-1}X^Ty$.

Similarly, the following equation is true by definition,

$(XX^{T})(XX^{T})^{-1} = I$,

which again, thanks to the associative property of matrix multiplication, can be written as

$X(X^{T}(XX^{T})^{-1}) = I$,

So we get that $(X^{T}(XX^{T})^{-1})$ is the right-inverse of $X$.

One Answer

$X^{-1}$ only makes sense for square matrices. For least-squares problems, we often have a strictly-skinny, full-rank matrix $X$. When $X$ is square and full rank, then you can use $theta = X^{-1}y$, as $(X^TX)^{-1}X^T = X^{-1}$. But when $X$ is strictly-skinny and full-rank, only the pseudoinverse $(X^TX)^{-1}X^T$ exists, which leads to the formula given in the link.

Correct answer by user303375 on January 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP