TransWikia.com

Derivation of the LMMSE (Linear Minimum Mean Squared Error) Estimate and the MMSE Under Gaussian Prior

Signal Processing Asked by McZhang on November 21, 2021

I am learning estimation theory through Steven M. Kay – Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory. In the Chapter 12 (Linear Bayesian Estimator), Theorem 12.1 (Bayesian Gauss-Markov Theorem) gives the LMMSE estimation of the signal based on the linear noisy measurement under the Gaussian prior assumption:

If the data are described by the Bayesian linear model form
begin{equation}
boldsymbol{x}=boldsymbol{Htheta}+boldsymbol{w} tag{12.25}
end{equation}

where $boldsymbol{x}$ is an $N times 1$ data vector, $boldsymbol{H}$ is a known $Ntimes p$ observation matrix, $boldsymbol{theta}$ is a $p times 1$ random vector of parameters whose realization is to be estimated and has mean $E(boldsymbol{theta})$ and covariance matrix $boldsymbol{C}_{thetatheta}$, and $boldsymbol{w}$ is an $N times 1$ random vector with zero mean and covariance matrix $boldsymbol{C}_w$ and is uncorrelated with $boldsymbol{theta}$ (the joint PDF $p(boldsymbol{w},boldsymbol{theta})$ is otherwise arbitrary), then the LMMSE estimator of $boldsymbol{theta}$ is
begin{align}
hat{boldsymbol{theta}} & = E(boldsymbol{theta})+boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) tag{12.26} \
& = E(boldsymbol{theta})+(boldsymbol{C}_{thetatheta}^{-1}+boldsymbol{H}^Tboldsymbol{C}_w^{-1}boldsymbol{H})^{-1}boldsymbol{H}^Tboldsymbol{C}_w^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) tag{12.27}
end{align}

The performance of the estimatior is measured by the error $boldsymbol{epsilon}=boldsymbol{theta}-hat{boldsymbol{theta}}$ whose mean is zero and whose covariance matrix is
begin{align}
boldsymbol{C}_boldsymbol{epsilon} &= E_{boldsymbol{x},boldsymbol{theta}}(boldsymbol{epsilon}boldsymbol{epsilon}^T) \
& = boldsymbol{C}_{thetatheta} – boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta} tag{12.28} \
& = (boldsymbol{C}_{thetatheta}^{-1}+boldsymbol{H}^Tboldsymbol{C}_w^{-1}boldsymbol{H})^{-1} tag{12.29}
end{align}

Since the prior of $boldsymbol{theta}$ is Gaussian, the LMMSE estimate $hat{boldsymbol{theta}}_{LMMSE}$ is equivalent to the MMSE estimate $hat{boldsymbol{theta}}_{MMSE}$, and $hat{boldsymbol{theta}}_{MMSE}$ is equal to the posterior mearn $E(boldsymbol{theta}|boldsymbol{x})$. Since the prior and likelihood are both Gaussian, the posterior distribution $p(boldsymbol{theta}|boldsymbol{x})$ is also Gaussian.

Here I am trying to derive $hat{boldsymbol{theta}}_{MMSE}$ and $boldsymbol{C}_boldsymbol{epsilon}$ from the perspective of PDF multiplication, that is, calculate $p(boldsymbol{theta}|boldsymbol{x}) propto p(boldsymbol{x}|boldsymbol{theta})p(boldsymbol{theta})=mathcal{N}(boldsymbol{x};boldsymbol{Htheta},boldsymbol{C}_{w})mathcal{N}(boldsymbol{theta};E(boldsymbol{theta}),boldsymbol{C}_{thetatheta})$, and formulate the quadratic and firse-order terms of $boldsymbol{theta}$ at the exponential to form a Gaussian PDF. The covariance matrix of $p(boldsymbol{theta}|boldsymbol{x})$ I got matches 12.29, but the posterior mean is the following form:
begin{equation}
E(boldsymbol{theta}|boldsymbol{x}) = boldsymbol{C}_{boldsymbol{epsilon}}(boldsymbol{H}^Tboldsymbol{C}_w^{-1} boldsymbol{x}+boldsymbol{C}_{thetatheta}^{-1}E(boldsymbol{theta})) tag{q1}
end{equation}

So my question is, is the posterior mean I got in q1 equal to the $hat{boldsymbol{theta}}$ given in 12.26 and 12.27? If so, how can I reach that?

By the way, I can’t find the way from 12.26 to 12.27 (12.28 to 12.29 either). So can someone give me a hint?

2 Answers

With the help of @Royi and @markleeds, I have found the answer is Yes, that q1 is consistent with 12.26 and 12.27. The key to see this is the Woodbury Matrix Identity.

12.29 to 12.28 is straightforward with the Woodbury Matrix Identity.

From 12.27 to 12.26: begin{align} hat{boldsymbol{theta}} & = E(boldsymbol{theta})+(boldsymbol{C}_{thetatheta}^{-1}+boldsymbol{H}^Tboldsymbol{C}_w^{-1}boldsymbol{H})^{-1}boldsymbol{H}^Tboldsymbol{C}_w^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) tag{12.27} \ & = E(boldsymbol{theta}) + (boldsymbol{C}_{thetatheta} - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta} ) boldsymbol{H}^Tboldsymbol{C}_w^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) \ &= E(boldsymbol{theta}) +boldsymbol{C}_{thetatheta}boldsymbol{H}^T[boldsymbol{I}-(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T]boldsymbol{C}_{w}^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) \ & = E(boldsymbol{theta})+boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}[boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w-boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T] boldsymbol{C}_{w}^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) \ & = E(boldsymbol{theta})+boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) tag{12.26} \ end{align}

From q1 to 12.26: begin{align} E(boldsymbol{theta}|boldsymbol{x}) &= boldsymbol{C}_{boldsymbol{epsilon}}(boldsymbol{H}^Tboldsymbol{C}_w^{-1} boldsymbol{x}+boldsymbol{C}_{thetatheta}^{-1}E(boldsymbol{theta})) tag{q1} \ &= (boldsymbol{C}_{thetatheta} - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta} )(boldsymbol{H}^Tboldsymbol{C}_w^{-1} boldsymbol{x}+boldsymbol{C}_{thetatheta}^{-1}E(boldsymbol{theta})) \ &= E(boldsymbol{theta}) - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}E(boldsymbol{theta})\ & quad + boldsymbol{C}_{thetatheta}boldsymbol{H}^Tboldsymbol{C}_w^{-1} boldsymbol{x} - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^Tboldsymbol{C}_w^{-1} boldsymbol{x} \ &= E(boldsymbol{theta}) - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}E(boldsymbol{theta})\ & quad + boldsymbol{C}_{thetatheta}boldsymbol{H}^T [boldsymbol{I}-(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T]boldsymbol{C}_w^{-1} boldsymbol{x} \ &= E(boldsymbol{theta}) - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}E(boldsymbol{theta})\ & quad + boldsymbol{C}_{thetatheta}boldsymbol{H}^T (boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1} [boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w-boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T]boldsymbol{C}_w^{-1} boldsymbol{x} \ & = E(boldsymbol{theta}) - boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}boldsymbol{H}E(boldsymbol{theta}) + boldsymbol{C}_{thetatheta}boldsymbol{H}^T (boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1} boldsymbol{x} \ & = E(boldsymbol{theta})+boldsymbol{C}_{thetatheta}boldsymbol{H}^T(boldsymbol{H}boldsymbol{C}_{thetatheta}boldsymbol{H}^T+boldsymbol{C}_w)^{-1}(boldsymbol{x}-boldsymbol{H}E(boldsymbol{theta})) tag{12.26} end{align}

Reference: Dr. Wei Dai - Imperial College London (IC) - January 2013 - A Tutorial on Kalman Filtering and MMSE Estimation of Gaussian Model.

Answered by McZhang on November 21, 2021

In the past I derived it as following:

enter image description here

It is a little different approach.

If it answers your question I will rewrite it in a proper LaTeX.

Regarding your question about the steps in the derivation you presented, it is using the Woodbury Matrix Identity (Both 12.26 to 12.27 and 12.28 to 12.29).

Related Answers:

In the answer above you may see some related derivations.

Answered by Royi on November 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP