Why do we need to emphasize sufficient statistics in generalized linear models?

Question

In generalized linear models, $$p(y;eta)=b(y)exp(eta^TT(y)-a(eta)) \ eta=theta^T x$$we assume $x$ is the input variable and $y$ is the output and our target is to get  the distribution of input variable $x$ depends on
$theta$,  i.e.$y=h_theta(x)$.
Slides(e.g. Andrew Ng cs229, ucb cs294) try to emphasize that $T(y)$ is a sufficient statistic, but get the parameter $theta$ by Maximum Likelihood Estimate. I know sufficient statistics definition and Factorization theorem, but I can't understand these questions:

Why do we need to emphasize $T(y)$ is a sufficient statistic in GLM? Maximum Likelihood Estimate can always get a estimated result, does $T(y)$ is a sufficient statistic or not matter? When we can get the parameter $theta$ by Maximum Likelihood Estimate, what can $T(y)$ help in the progress of estimating $theta$?
When we want to get $h_theta(x)$, why do we go to cumpute $E(T(y))$ (cs229 say that $E(T(y))$ is one of three components of GLM)? In other words, why $h_theta(x)=E(T(y))$? Just because $T(y)$ usually is $y$ in the exponential family(i.e. $T(y)=y$)? How to link $h_theta(x)$ and $E(T(y))$? As for logistic regression, cs229 gives that $$h_theta(x) = p(y=1mid x;theta) = 0 cdot p(y=0mid x;theta) + 1cdot p(y=1mid x;theta) = E[ymid x;theta]$$, but it's a special distribution, what about other distributions?

Why do we need to emphasize sufficient statistics in generalized linear models?

Add your own answers!

Ask a Question