Linear regression is a estimation of conditional expectation?

Question

I am studying the topic of regression for the first time and some questions arise. First, linear regression is a estimation of conditional expectation? And also the conditional expectation estimate is the so-called $y ̂$ estimate? This is:
$$y=E(Y|X)+e$$ $$y=y ̂+e$$ $$y ̂=E(Y|X)$$ $$?(?|?)=?+??$$
Second, the linearity of the parameters is an assumption of the linear regression to estimate the conditional expectation? $$ $$ Third, Hansen's book on econometrics says about this problem: "the linear CEF model is empirically unlikely to be accurate unless $x$ is discrete and low-dimensional so all interactions are included. Consequently in most cases it is more realistic to view the linear specification as an approximation". What interpretation can be given to this phrase?

Dave · Answer

Yes and yes. There is a subtle technical point here, though I hesitate to mention it until you’ve gotten used to the idea of regression predicting an expected value instead of just a number that “should” be the right answer.

(Don’t read this parenthetical part for a few months or years until you’re much more comfortable with regression. The subtle point is that we often don’t see the predictors as random variables, so there isn’t a multivariate distribution where we condition on many variables to examine $Y$. We think of $Yvert X$ as a family of univariate distributions that are parameterized by the predictor variables. This is technically correct in many cases but not especially useful, particularly not to a beginner.)

Right again!

For the first two, I think it makes sense when you start simulating regressions. I’ll let you think about how to do that and can come back and edit this answer with some R code. But I do think it’s a good exercise to think through it for a while.

This gets into a George Box quote: “All models are wrong, but some are useful.” No, we probably don’t have real phenomena following perfectly linear patterns, much like real data don’t follow perfectly normal distributions. However, a linear model might provide a good enough model for us to do something useful.

dimitriy · Answer

The linear regression provides the minimum mean squared error linear-in-parameters approximation to the CEF. If you can approximate a function with a Taylor series expansion with enough terms, you could do this pretty well, even if the actual CEF is nonlinear, by using lots of interactions and polynomial terms as long as you have enough data and have not left anything important out of your model.
If your world is truly is low dimensional and discrete, by calculating the mean in each cell (like average wage for college educated Asian women who live in the Midwest and enjoy musical theatre), your approximation of the CEF could be very good. This is what it means to include all interactions. With continuous covariates this is harder, since you have to either bin your data or smooth it to interpolate the unobserved data, and the approximation can be quite poor.
Here's toy example where we approximate a fairly non-linear Poisson CEF $$E[Y vert X,Z] = exp(a + b cdot X +c cdot Z + d cdot X cdot Z)$$ with means and with regression with all interactions. Here X takes on 5 values and Z takes on 2, so we have 10 cells in total if we use dummy variables:
. set obs 5
number of observations (_N) was 0, now 5

. gen x = _n

. expand 100
(495 observations created)

. gen z = mod(_n,2)

. gen y = rpoisson(x+2*z)

. table x z, c(mean y)

----------------------
          |     z     
        x |    0     1
----------+-----------
        1 | 1.06  2.76
        2 | 2.04  4.16
        3 | 2.96  4.96
        4 | 4.26  6.58
        5 | 5.18  6.76
----------------------

. quietly reg y i.x#i.z

. margins x#z

Adjusted predictions                            Number of obs     =        500
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         x#z |
        1 0  |       1.06   .2874746     3.69   0.000      .495165    1.624835
        1 1  |       2.76   .2874746     9.60   0.000     2.195165    3.324835
        2 0  |       2.04   .2874746     7.10   0.000     1.475165    2.604835
        2 1  |       4.16   .2874746    14.47   0.000     3.595165    4.724835
        3 0  |       2.96   .2874746    10.30   0.000     2.395165    3.524835
        3 1  |       4.96   .2874746    17.25   0.000     4.395165    5.524835
        4 0  |       4.26   .2874746    14.82   0.000     3.695165    4.824835
        4 1  |       6.58   .2874746    22.89   0.000     6.015165    7.144835
        5 0  |       5.18   .2874746    18.02   0.000     4.615165    5.744835
        5 1  |       6.76   .2874746    23.52   0.000     6.195165    7.324835
------------------------------------------------------------------------------

. quietly poisson y i.x#i.z

. margins x#z

Adjusted predictions                            Number of obs     =        500
Model VCE    : OIM

Expression   : Predicted number of events, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         x#z |
        1 0  |       1.06   .1456022     7.28   0.000     .7746249    1.345375
        1 1  |       2.76   .2349468    11.75   0.000     2.299513    3.220487
        2 0  |       2.04   .2019901    10.10   0.000     1.644107    2.435893
        2 1  |       4.16   .2884441    14.42   0.000      3.59466     4.72534
        3 0  |       2.96   .2433105    12.17   0.000      2.48312     3.43688
        3 1  |       4.96   .3149603    15.75   0.000     4.342689    5.577311
        4 0  |       4.26   .2918904    14.59   0.000     3.687905    4.832095
        4 1  |       6.58   .3627671    18.14   0.000     5.868989    7.291011
        5 0  |       5.18   .3218695    16.09   0.000     4.549147    5.810853
        5 1  |       6.76   .3676955    18.38   0.000      6.03933     7.48067
------------------------------------------------------------------------------

If you omit the interaction between X and Z, you get something slightly worse:
. quietly reg y i.x i.z

. margins x#z

Adjusted predictions                            Number of obs     =        500
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         x#z |
        1 0  |      1.024   .2111675     4.85   0.000     .6091028    1.438897
        1 1  |      2.936   .2111675    13.90   0.000     2.521103    3.350897
        2 0  |      1.914   .2111675     9.06   0.000     1.499103    2.328897
        2 1  |      3.826   .2111675    18.12   0.000     3.411103    4.240897
        3 0  |      3.324   .2111675    15.74   0.000     2.909103    3.738897
        3 1  |      5.236   .2111675    24.80   0.000     4.821103    5.650897
        4 0  |      3.854   .2111675    18.25   0.000     3.439103    4.268897
        4 1  |      5.766   .2111675    27.31   0.000     5.351103    6.180897
        5 0  |      5.084   .2111675    24.08   0.000     4.669103    5.498897
        5 1  |      6.996   .2111675    33.13   0.000     6.581103    7.410897
------------------------------------------------------------------------------

This is an example of misspecification.

Linear regression is a estimation of conditional expectation?

2 Answers

Add your own answers!

Ask a Question