# Why is the standard deviation of the average of averages smaller than the standard deviation of the total?

Cross Validated Asked by Pinocchio on December 9, 2020

Say I want to estimate the test error. I can either get $$N$$ batch $$B_i$$ then take the average of their average error (so the R.V. is the mean):

$$frac{1}{N} sum^N_{n=1} mu(B)$$

or I can take collect the errors and the take a massive average (so the R.V. is the loss):

$$frac{1}{NB} sum^{NB}_{i=1} L(z_i)$$

I’m fairly certain that they both have the same error, but do they have the same std? From my numerical experiments I don’t think they do (which the first one being superior to the second one, especially as B gets larger):

Error with average of averages
80%|████████  | 4/5 [01:12<00:18, 18.14s/it]
-> err = 12.598432122111321 +-1.7049395893844483

Error with sum of everything
80%|████████  | 4/5 [01:11<00:17, 17.77s/it]
-> err = 11.505614456176758 +-13.968025155156523


what is the difference? Is the covariance some how affecting things, if yes how?

I think I understand that I can just make the batch size super big instead of taking lots of averages but now I am just annoyed that I don’t understand the difference between these too. I don’t think there should be a difference and if there is a difference WHEN does it happen?

## Related Questions

### How do betting sites update odds during a sporting match in real-time?

1  Asked on December 13, 2020 by svexo

### How to interpret the negative variances

1  Asked on December 12, 2020 by aakash-bashyal

### How to interpret balances after ILR transform for compositional data?

0  Asked on December 12, 2020 by samme-galanakis

### Specifying specific priors for a correlation matrix via Stan

1  Asked on December 11, 2020 by sue-doh-nimh

### Example of mean independent variables but dependent still

0  Asked on December 11, 2020 by luchonacho

### When are observations not weakly exchangeable?

1  Asked on December 11, 2020 by rumtscho

### How big should my subsample be?

1  Asked on December 11, 2020 by kaecvtionr

### Spirtes’ example of d-separation not leading to independence in a directed cyclic graph with non-linear structural equations

1  Asked on December 10, 2020 by quant_dev

### Asymptotic normality for nonsmooth objective functions

1  Asked on December 10, 2020

### Regression: is it wrong to bin a continuous variable to overcome overfitting?

1  Asked on December 10, 2020 by st4co4

### How do you compare standard deviations?

2  Asked on December 10, 2020 by yaynikkiprograms

### How to interpret the beta estimates of a generalized linear model with a square root power link?

0  Asked on December 10, 2020 by statboy_41

### Can k-fold CV help reduce sampling bias?

0  Asked on December 9, 2020 by aite97

### Why is the standard deviation of the average of averages smaller than the standard deviation of the total?

0  Asked on December 9, 2020 by pinocchio

### Calculating bias of ML estimate of AR(1) coefficient

1  Asked on December 9, 2020 by andrew-kirk

### Using residuals from linear regression for normality testing for ANOVA

0  Asked on December 9, 2020 by s-ramagokula-krishnan

### How does scaled conjugate gradient work in neural network training? Comparison with gradient descent

0  Asked on December 9, 2020 by johanna

### For B-spline what does $sum_{i=0,n}N_{i,k}(t)=1$ mean?

1  Asked on December 9, 2020

### Is it possible to detect overfitting automatically/programmatically after model creation?

0  Asked on December 9, 2020 by ayberk-yavuz

### R lmer model: degree of freedom and chi square values are zero

1  Asked on December 9, 2020 by roromario