# Why do we need to emphasize sufficient statistics in generalized linear models?

Cross Validated Asked by tranquil.coder on November 27, 2020

In generalized linear models, $$p(y;eta)=b(y)exp(eta^TT(y)-a(eta)) \ eta=theta^T x$$we assume $$x$$ is the input variable and $$y$$ is the output and our target is to get the distribution of input variable $$x$$ depends on
$$theta$$, i.e.$$y=h_theta(x)$$.
Slides(e.g. Andrew Ng cs229, ucb cs294) try to emphasize that $$T(y)$$ is a sufficient statistic, but get the parameter $$theta$$ by Maximum Likelihood Estimate. I know sufficient statistics definition and Factorization theorem, but I can’t understand these questions:

1. Why do we need to emphasize $$T(y)$$ is a sufficient statistic in GLM? Maximum Likelihood Estimate can always get a estimated result, does $$T(y)$$ is a sufficient statistic or not matter? When we can get the parameter $$theta$$ by Maximum Likelihood Estimate, what can $$T(y)$$ help in the progress of estimating $$theta$$?
2. When we want to get $$h_theta(x)$$, why do we go to cumpute $$E(T(y))$$ (cs229 say that $$E(T(y))$$ is one of three components of GLM)? In other words, why $$h_theta(x)=E(T(y))$$? Just because $$T(y)$$ usually is $$y$$ in the exponential family(i.e. $$T(y)=y$$)? How to link $$h_theta(x)$$ and $$E(T(y))$$? As for logistic regression, cs229 gives that $$h_theta(x) = p(y=1mid x;theta) = 0 cdot p(y=0mid x;theta) + 1cdot p(y=1mid x;theta) = E[ymid x;theta]$$, but it’s a special distribution, what about other distributions?

## Related Questions

### Random Censoring scheme in Weibull Distribution

0  Asked on December 8, 2020 by soham-bagchi

### fixed effects vs random effects vs random intercept model

1  Asked on December 8, 2020 by daniela-rodrigues

### Immediate NaN in loss function with custom activation without extreme batch size–how to prevent exploding gradients?

0  Asked on December 8, 2020 by rain

### Eacf table interpretation in R

2  Asked on December 8, 2020

### log-odds and it’s standard error as priors in logistic regression

1  Asked on December 8, 2020 by r_user

### What is the difference between $beta_1$ and $hat{beta}_1$?

3  Asked on December 8, 2020 by stan-shunpike

### Interpret credible intervals / HPD following posterior sampling

1  Asked on December 8, 2020 by walterb

### Why GEE estimates are smaller than GLMM?

1  Asked on December 7, 2020

### Help with the prior distribution

1  Asked on December 7, 2020 by dom-jo

### Conservative confidence interval for linear combination of parameters

0  Asked on December 7, 2020

### Neural network based on twitter followers, what would be my features?

5  Asked on December 6, 2020 by sharki

### Check if residuals are IID (timeseries)

4  Asked on December 5, 2020 by mgr

### How to split dataset for time-series prediction?

5  Asked on December 4, 2020 by tobip

### What are some existing techniques for pose estimation angle normalization?

1  Asked on December 4, 2020 by tbizzy0808

### ARIMA model with multiple covariates, XREG

1  Asked on December 3, 2020 by bromideh

### Measure of rater agreements for rank order?

1  Asked on December 3, 2020 by cdalitz

### Which distribution to use in the following scenario?

1  Asked on December 3, 2020 by arindam-bose

### Comparing a random sample and a non random sample extracted from a finite population

1  Asked on December 1, 2020 by alessandro-jacopson

### Logistic Regression cost function for joint optimization based on relevance and profit

0  Asked on December 1, 2020 by marsellus-wallace

### Calculate R² from regression estimates (beta coefficients, variances of variables etc.)

0  Asked on December 1, 2020 by phx