Why is the standard deviation of the average of averages smaller than the standard deviation of the total?

Question

Say I want to estimate the test error. I can either get $N$ batch $B_i$ then take the average of their average error (so the R.V. is the mean):

$$ frac{1}{N} sum^N_{n=1} mu(B)$$

or I can take collect the errors and the take a massive average (so the R.V. is the loss):

$$ frac{1}{NB} sum^{NB}_{i=1} L(z_i) $$

I’m fairly certain that they both have the same error, but do they have the same std? From my numerical experiments I don’t think they do (which the first one being superior to the second one, especially as B gets larger):

Error with average of averages
80%|████████  | 4/5 [01:12<00:18, 18.14s/it]
-> err = 12.598432122111321 +-1.7049395893844483

Error with sum of everything
80%|████████  | 4/5 [01:11<00:17, 17.77s/it]
-> err = 11.505614456176758 +-13.968025155156523

what is the difference? Is the covariance some how affecting things, if yes how?

I think I understand that I can just make the batch size super big instead of taking lots of averages but now I am just annoyed that I don’t understand the difference between these too. I don’t think there should be a difference and if there is a difference WHEN does it happen?

machine learning mean non independent standard deviation standard error

Why is the standard deviation of the average of averages smaller than the standard deviation of the total?

Add your own answers!

Ask a Question