Counterexample where E(u|x)=0 in a regression model cannot hold in the population?

Question

Edit:
Background information:
I have two variables of interest, $y$ and $x$ that are linearly related via the following: $y = a + bx + u$, where "$a$" and "$b$" are fixed parameters to solve for, and "$u$" is the error term and captures the fact that there are other factors that also affect $y$, which aren't captured by "$a+bx$" alone. In this dataset, there would be 9 errors terms $u$ for each of the 9 $(x,y)$ pairs below.
We are told in textbooks that if we assume $$E(u|x) = 0$$ then the population regression function can be interpreted as $E(y|x) = a + bx$, where "$a$" and "$b$" are the population parameters.
However I can construct a counterexample where this doesn't hold:
$$begin{array}{c|c|c|} 
 & text{x} & text{y} \ hline
text{} & 1 & 12 \ hline
text{} & 2 & 14 \ hline
text{} & 3 & 16\ hline
text{} & 4 & 20\ hline
text{} & 5 & 25\ hline
text{} & 6 & 29\ hline
text{} & 7 & 31\ hline
text{} & 8 & 40\ hline
text{} & 9 & 20\ hline
end{array}$$
Suppose the above is a population dataset (I'm assuming I know exactly the population, there is no sample). $E(y|x) = y$, for instance $E(y|x=1) = 12$ since $x=1$ has only 1 $y$ value which is equal to 12. If this is true whats the population regression line that relates "$y$" to "$x$" in a linear form $y=a+bx+u$ where $E(u|x)=0$ actually holds?
If I solve for "$a$" and "$b$" using a regression calculator gives us $Y = 10.87 + 2.41x$. However, this does not satisfy the property of $E(u|x)=0$, clearly for $x=1$, the predicted value is not equal to 12, there is an error. So how is it that linear regression satisfies $E(u|x)$?
Does this mean $E(u|x) = 0$ doesn't hold in the population and is just an assumption?
Even under the assumption that $E(u|x)$ is true, with the population dataset above I cannot find a linear equation that satisfies this. What are the implications of this in real world examples when it doesn't seem to hold?

BigBendRegion · Answer

This is a great example that illustrates why, in regression models, (i) the assumption $E(u | x) = 0$ should not be used, and (ii) the "population" framework should not be used.
Rather than the assumption $E(u | x) =0$, it would make much more sense to state the assumption in the equivalent form $E(y | X=x) = beta_0 +beta_1 x$. This assumption states that the means of the conditional distributions fall exactly on a line of the form $beta_0 + beta_1 x$, for some $beta_0$, $beta_1$.
As the OP notes, the conditional distributions in the population framework are all degenerate, so that the mean is just equal to the single $y$ value. For example, the distribution of $Y | X = 9$ is given by  $Pr(Y = 20 | X=9) = 1$, with $Pr(Y = y | X=9) = 0$, for all $y neq 20$.  The mean of this distribution is clearly 20.
Since these conditional mean values do not all fall precisely on a straight line, the assumption $E(y | X=x) = beta_0 +beta_1 x$ is violated. This explains the OP's finding that $E(u | x) neq 0$.
Here is an example "population" where it works.
$$begin{array}{c|c|c|} 
 & text{x} & text{y} \ hline
text{} & 1 & 12 \ hline
text{} & 1 & 14 \ hline
text{} & 1 & 16\ hline
text{} & 2 & 20\ hline
text{} & 2 & 24\ hline
text{} & 2 & 28\ hline
text{} & 3 & 31\ hline
text{} & 3 & 33\ hline
text{} & 3 & 38\ hline
end{array}$$
Here the conditional means are 14, 24, 34, falling on a linear function of $x = 1,2,3$. Consider the distribution of $y | X=3$:
$$begin{array}{c|c|c|} 
 & text{p(y|x)} & text{y} \ hline
text{} & 1/3 & 31\ hline
text{} & 1/3 & 33\ hline
text{} & 1/3 & 38\ hline
end{array}$$
The distribution of $u$ is obtained by replacing the $y$ values with $y-34$, so $E(u | X = 3) = (1/3)(31 - 34) + (1/3)(33-34) + (1/3)(38-34) = 0.$
If the means of the distributions are configured so that they do not fall exactly on a line, then $E(u | X = x) neq 0$ for some $x$.
This example also illustrates the point that the "population model" should not be used to define the regression model.  As the examples illustrate, the conditional means are not really true means in the scientific, generalizable sense, they are instead quite noisy due to small sample sizes in the subpopulation defined by the "$| x$." In some cases, there may be no observations whatsoever in such subsets of the population, even when the population is large.  This problem is magnified multiplicatively in the case of multiple regression.

Counterexample where E(u|x)=0 in a regression model cannot hold in the population?

One Answer

Add your own answers!

Ask a Question