TransWikia.com

Endogeneity testing using correlation test

Cross Validated Asked by sabiste on November 9, 2021

I am currently testing my linear model using OLS method. The last thing I have to test is endogeneity issue. Is it enough if I test each explanatory variable for correletion with error term? Than means I save the residuals of my original model and I use them in cor.test in R paired with each explanatory variable? I would like to test first whether there is existence of the endogeneity issue before going further with advanced methods to deal with the problem.

I know there is proper way of testing using for example Hausman test which compares the results of OLS and 2SLS, but using 2SLS and IV seems to be very complicated to me considering the level of my knowledge.

3 Answers

The reply of Dimitry can be enough, as markowitz says, but I'd like to add a very simple simulation:

> set.seed(1234)             
> x <- rnorm(1000)          # predictor
> u <- x + rnorm(1000)      # "true" error, correlated with x
> y <- 3 + 2*x + u          # outcome

Let's fit a linear model:

> fit <- lm(y ~ x)
> fit
[...]
Coefficients:
(Intercept)            x  
      3.029        3.016  

As you can see, the estimated coefficient for $x$ is biased. Why? Because $x$ and $u$ are correlated:

> cor(x,u)
[1] 0.7073596

What about residuals?

> r <- fit$residuals
> cor(x,r)
[1] 2.200033e-17

$x$ and residuals are not correlated, and they are never correlated. Why? Well, we need a bit of math: $$text{if}quadhatbeta=(X^TX)^{-1}X^Ty,quadtext{then}quad r=y-Xhatbeta=y-X(X^TX)^{-1}X^Ty$$ and we always have: $$X^Tr=X^Ty-X^TX(X^TX)^{-1}X^Ty=0$$

markovitz says: "I suppose that sabiste conflated the role of residuals with that of true error terms. Common mistake among neophyte." Sure, but not only among neophytes :)

Fifteen years ago a paper argued that "exogeneity constraints that are commonly assumed in econometric treatments of the Gauss-Markov theorem are unnecessary for OLS estimates of the classical linear regression model to be BLU" [...] "because orthogonality is a property of all OLS estimates. The geometry of least squares forces the errors in a regression equation to be orthogonal to all of the regressors in the equation."

A few years later, another paper was published in the same journal. Its title was: Wouldn't It Be Nice...? The Automatic Unbiasedness of OLS (and GLS): "the intrinsic orthogonality he is thinking of is of $X$ with $hat{u}$ [my $r$], not $u$."

I think that reading those papers could be an (amusing, and) useful way to better understand the endogeneity issue.

Answered by Sergio on November 9, 2021

The reply of Dimitry can be enough. However I suppose that your question come from one "rule" frequently used in Econometrics books. Then, briefly, if some included regressors and error term are correlated we have endogeneity problem. Unfortunately some presentation do not underscore effectively what kind of "error term" are involved in this "rule".

We can intend it as the "true error term", the error term of the true model. The exogeneity assumption for OLS come from here.

Alternatively we can intend this "error term" as the error term of the misspecified model, where the misspecification can appear clearly only if the true model is known.

In real world this error term is an unobservable quantity. What you observe are the "residuals", related but different things. From residuals only we cannot discover endogeneity, in fact in OLS framework exogeneity is an untestable assumption.

EDIT: Just a warning. The problem of endogeneity (then exogeneity) is of tremendous importance in econometrics and can be write down in various version. Even for this reason the debate, and sometimes confusion, about those concepts is common. In my view concepts like endogeneity (then exogeneity) must be always related to causality and, therefore, structural concepts. I wrote something about that in this site, see here for instance:

endogenous regressor and correlation

Regression and causality in econometrics

Endogeneity in forecasting

Keeping aside the above aspects. Here I limit myself to suppose what sabiste had in his mind when wrote his question. In econometrics presentations is common to take back various problem like: omitted variables, simultaneity, measurement errors; to endogeneity problem. Shortly, endogeneity imply biasedness in some parameters.

In the "rule" the correlation between errors and included regressors are indicated as the core of the problem; the trace of him. We can read Wikipedia also:

If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent.

https://en.wikipedia.org/wiki/Endogeneity_(econometrics)

at least at general level, no other conditions are added. I suppose that sabiste conflated the role of residuals with that of error terms intended as clarified above. Common mistake among neophyte.

Answered by markowitz on November 9, 2021

This would not give you a valid test of endogeneity. Estimated residuals will be uncorrelated with included regressors by construction. You can work through the math or find a derivation, but you can also easily convince yourself of this with a simple simulation.

Answered by dimitriy on November 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP