# Why is the standard deviation of the average of averages smaller than the standard deviation of the total?

Cross Validated Asked by Pinocchio on December 9, 2020

Say I want to estimate the test error. I can either get $$N$$ batch $$B_i$$ then take the average of their average error (so the R.V. is the mean):

$$frac{1}{N} sum^N_{n=1} mu(B)$$

or I can take collect the errors and the take a massive average (so the R.V. is the loss):

$$frac{1}{NB} sum^{NB}_{i=1} L(z_i)$$

I’m fairly certain that they both have the same error, but do they have the same std? From my numerical experiments I don’t think they do (which the first one being superior to the second one, especially as B gets larger):

Error with average of averages
80%|████████  | 4/5 [01:12<00:18, 18.14s/it]
-> err = 12.598432122111321 +-1.7049395893844483

Error with sum of everything
80%|████████  | 4/5 [01:11<00:17, 17.77s/it]
-> err = 11.505614456176758 +-13.968025155156523


what is the difference? Is the covariance some how affecting things, if yes how?

I think I understand that I can just make the batch size super big instead of taking lots of averages but now I am just annoyed that I don’t understand the difference between these too. I don’t think there should be a difference and if there is a difference WHEN does it happen?

## Related Questions

### Quantify whether a set of binary segmentation models (experts) have diversity on a fixed dataset?

1  Asked on December 30, 2020 by saeed

### Weighted normal errors regression with censoring

1  Asked on December 29, 2020 by paul-m

### Standard Error or Standard Deviation for error associated with averaging raster values within a polygon?

0  Asked on December 28, 2020 by jbukoski

### Is it possible to interchange the quantile operator and a measurable monotone function? $Q_theta(f(X)) = f(Q_theta(X))$

3  Asked on December 27, 2020

### ROC Curve for data sets with large negative bias

0  Asked on December 27, 2020 by malek

### How to apply distance-based clustering or dimensionality reduction for too many samples

1  Asked on December 27, 2020 by matin-kh

### Understanding multiple regression coefficients and calculations

2  Asked on December 27, 2020 by p34y2

### Arima model giving high forecast values

1  Asked on December 27, 2020

### How to numerically solve for a variant of the weighted least squares

0  Asked on December 27, 2020 by namelessgods

### Changing representation in deep neural network

1  Asked on December 26, 2020

### CNN: Details of Zeiler Fergus Net

1  Asked on December 26, 2020 by vrx

### Chi-squared test, Poisson distribution, type I error overestimated – well-suited test for discrete distributions?

0  Asked on December 25, 2020 by slava-kohut

### Interpreting hamming loss for multilabel classification

1  Asked on December 24, 2020

### Books on using SAS to analyze market risk

1  Asked on December 24, 2020 by victor

### Is the product of conditional posterior equal to the joint distribution?

0  Asked on December 24, 2020 by calveeen

### Probability of intersection involving a continuum

0  Asked on December 24, 2020 by kacem-abd-el-aziz

### same cdf equals same expectation?

1  Asked on December 24, 2020 by natalia

### How to best represent missing count data?

0  Asked on December 23, 2020 by meilton

### Can we estimate the mean of an asymmetric distribution in an unbiased and robust manner?

2  Asked on December 21, 2020