# Why do we need to emphasize sufficient statistics in generalized linear models?

Cross Validated Asked by tranquil.coder on November 27, 2020

In generalized linear models, $$p(y;eta)=b(y)exp(eta^TT(y)-a(eta)) \ eta=theta^T x$$we assume $$x$$ is the input variable and $$y$$ is the output and our target is to get the distribution of input variable $$x$$ depends on
$$theta$$, i.e.$$y=h_theta(x)$$.
Slides(e.g. Andrew Ng cs229, ucb cs294) try to emphasize that $$T(y)$$ is a sufficient statistic, but get the parameter $$theta$$ by Maximum Likelihood Estimate. I know sufficient statistics definition and Factorization theorem, but I can’t understand these questions:

1. Why do we need to emphasize $$T(y)$$ is a sufficient statistic in GLM? Maximum Likelihood Estimate can always get a estimated result, does $$T(y)$$ is a sufficient statistic or not matter? When we can get the parameter $$theta$$ by Maximum Likelihood Estimate, what can $$T(y)$$ help in the progress of estimating $$theta$$?
2. When we want to get $$h_theta(x)$$, why do we go to cumpute $$E(T(y))$$ (cs229 say that $$E(T(y))$$ is one of three components of GLM)? In other words, why $$h_theta(x)=E(T(y))$$? Just because $$T(y)$$ usually is $$y$$ in the exponential family(i.e. $$T(y)=y$$)? How to link $$h_theta(x)$$ and $$E(T(y))$$? As for logistic regression, cs229 gives that $$h_theta(x) = p(y=1mid x;theta) = 0 cdot p(y=0mid x;theta) + 1cdot p(y=1mid x;theta) = E[ymid x;theta]$$, but it’s a special distribution, what about other distributions?

