Seemingly Unrelated Regression Estimation - Equivalent to OLS Standard errors?

Question

In a SURE framework, if all X are the same in all regressions I was under the impression that there is no efficiency gain. Recently an assistant professor told me that the beta coefficients would be the same as OLS, but the standard errors would decrease due to the SURE framework even though all the X are the same.
Looking at the derivation in Greene's 7th edition, section 10.2.2, I believe I am correct.
Can anyone clarify further? Does SURE give an efficiency improvement if all X are the same in all regressions?

tdm · Accepted Answer

Assume that for each observation $i = 1,ldots, N$, we have $M$ equations:
$$
y_{i,j} = x_{i,j}beta_j + varepsilon_{i,j}
$$
Where $i = 1,ldots, N$ enumerates individuals and and $j = 1,ldots, M$ enumerates the equations. here $x$ is of size $1 times k_j$ and $beta_j$ is of size $k_j times 1$ and $k_j$ is the number of covariates for regression $j$. Stacking over all $i = 1,ldots N$, we get $M$ equations:
$$
y_j = X_j beta_j + varepsilon_j
$$
where now $X_j$ is of size $N times k_j$. For simplicity, assume that $X_j$ are non-stochastic. Next, assume that for all $i = 1,ldots, N$ and $j = 1,ldots, M$:
$$
begin{align*}
&mathbb{E}(varepsilon_{i,j}) = 0,
&mathbb{E}(varepsilon_{i,j}^2) = sigma_{jj}
end{align*}
$$
For the covariance between equations, let for all $i = 1,ldots, N$ and $j,ell = 1,ldots, M$:
$$
mathbb{E}(varepsilon_{i,j} varepsilon_{i,ell}) = sigma_{j,ell}
$$
while for all $j,ell = 1,ldots, M$ and $i,i' = 1,ldots, N$ with $i ne i'$:
$$
mathbb{E}(varepsilon_{i,j}, varepsilon_{i',k}) = 0
$$
This means that errors for the same individual might be correlated across equations, while errors for different individuals are uncorrelated.
This can be expressed more compactly as:
$$
cov(varepsilon_j, varepsilon_{ell}) = sigma_{j,ell}I_N
$$
Now, let us stack the various equations, one on top of the other:
$$
y = Zbeta + varepsilon,
$$
where:
$$
y = begin{bmatrix} y_1y_2 vdotsy_Mend{bmatrix},
varepsilon = begin{bmatrix} varepsilon_1  vdots  varepsilon_M end{bmatrix},
Z = begin{bmatrix} X_1 & 0 & ldots & 0
0 & X_2 & ldots & 0,
vdots  & vdots & ddots & vdots
0 & 0 & ldots & X_M
end{bmatrix}, beta = begin{bmatrix} beta_1 vdots  beta_Mend{bmatrix}
$$
The variance-covariance matrix of $varepsilon$ takes the form:
$$
mathbb{E}(varepsilon varepsilon') = V = begin{bmatrix} sigma_{11} I_N & sigma_{12}I_N & ldots & sigma_{1M} I_N
sigma_{21} I_N & sigma_{22} I_N & ldots & sigma_{2N} I_N
ldots  & ldots & ddots  & vdots
sigma_{M1} I_N & ldots & ldots & sigma_{MM}I_N
end{bmatrix} = Sigma otimes I_N
$$
where $otimes$ is the Kronecker product and:
$$
Sigma = begin{bmatrix}sigma_{11} & sigma_{12} & ldots & sigma_{1M}
sigma_{21} & sigma_{22} & ldots & sigma_{2M}
vdots & vdots & ddots & vdots
sigma_{M1} & sigma_{M2} & ldots & sigma_{MM}
end{bmatrix}
$$
$Sigma$ gives the variance covariance matrix of the errors for a fixed individual.
For the Kronecker product, we have the rules: $(A otimes B)^{-1} = A^{-1} otimes B^{-1}$ and $(A otimes B)(C otimes D) = AC otimes BD$ and $(A otimes B)' = A' otimes B'$ .
Let $hat Sigma$ be the estimate of $Sigma$ based on an initial OLS estimation of $y_j$ on $X_j$ and let $hat V = hat Sigma otimes I_N$. Then the feasible GLS estimator is given by:
$$
begin{align*}
hat beta &= (Z' hat V^{-1} Z)^{-1} Z' hat V^{-1} y,
&=(Z'(hat Sigma otimes I_N)^{-1}Z)^{-1}Z'(hat Sigma otimes I_N)^{-1}y,
&= (Z'(hat Sigma^{-1}otimes I_N)Z)^{-1}Z'(hat Sigma^{-1}otimes I_N)y,
&= beta + (Z'(hat Sigma^{-1}otimes I_n)Z)^{-1}Z'y
end{align*}
$$
Now, let us assume that all the $X_i$ are identical, say $X$, then $Z = I_M otimes X$ and we can further simplify:
$$
begin{align*}
hat beta &= (Z'(hat Sigma^{-1}otimes I_N)Z)^{-1}Z'(hat Sigma^{-1}otimes I_N)y,
&= ((I_M otimes X)'(hat Sigma^{-1}otimes I_N)(I_M otimes X))^{-1}(I_M otimes X)'(hat Sigma^{-1}otimes I_N)y,
&= ((I_M hat Sigma^{-1}otimes X'I_N)(I_M otimes X))^{-1}(I_M hat Sigma^{-1} otimes X' I_N)y,
&= (hat Sigma^{-1} otimes X'X)^{-1}(hat Sigma^{-1}otimes X')y,
&= (hat Sigma otimes (X'X)^{-1})(hat Sigma^{-1}otimes X')y,
&= (hat Sigmahat Sigma^{-1} otimes (X'X)^{-1}X')y
&= (I_M otimes (X'X)^{-1} X')y
end{align*}
$$
Notice that $hat Sigma$ disappeared from this equation. The last one equation can be written in the following way:
$$
hat beta = begin{bmatrix} (X'X)^{-1}X'y_1
(X'X)^{-1} X'y_2
vdots
(X'X)^{-1}X' y_1
end{bmatrix} = beta + begin{bmatrix}(X'X)^{-1}X'varepsilon_1, (X'X)^{-1}X'varepsilon_2vdots  (X'X)^{-1}X'varepsilon_Mend{bmatrix}
$$
So the feasible GLS estimates are identical to the OLS estimates from an equation by equation estimation. Notice that this also means that the residuals $hat varepsilon_j$ will be identical to the residuals from an OLS estimation.
Now to estimate the variance covariance matrix, we take the product $(hat beta - beta)(hat beta - beta)'$ which gives a matrix with entries:
$$
begin{align*}
(hat beta_{j} - beta_j)(hat beta_j - beta_j)' &= [(X'X)^{-1}X' varepsilon_j][(X'X)^{-1}X'varepsilon_j]',
&= (X'X)^{-1}X'varepsilon_j varepsilon_j'X(X'X)^{-1}
end{align*}
$$
Then for equation $j$, we have the variance covariance matix:
$$
V(hat beta_j) =  mathbb{E}((hat beta_j - beta_j)(hat beta_j - beta_j)) = sigma_{jj}left(X'Xright)^{-1},
$$
As $sigma_{jj}$ is not known, it is usually estimated by $hat sigma_{jj} = frac{1}{N}sum_i hat varepsilon_{i,j}^2$ where $hat varepsilon_{i,j}$ are the residuals of the feasible GLS estimator. However, in this case, these will be identical to the residuals of an OLS estimator (as the estimators $hat beta$ are identical). As such, the estimates of the variances of $hat beta$ for the SUR will be identical to the variance estimates of the OLS estimates (equation by equation).

Seemingly Unrelated Regression Estimation - Equivalent to OLS Standard errors?

One Answer

Add your own answers!

Ask a Question