Cross Validated Asked by Pinocchio on December 9, 2020

Say I want to estimate the test error. I can either get $N$ batch $B_i$ then take the average of their average error (so the R.V. is the mean):

$$ frac{1}{N} sum^N_{n=1} mu(B)$$

or I can take collect the errors and the take a massive average (so the R.V. is the loss):

$$ frac{1}{NB} sum^{NB}_{i=1} L(z_i) $$

I’m fairly certain that they both have the same error, but do they have the same std? From my numerical experiments I don’t think they do (which the first one being superior to the second one, especially as B gets larger):

```
Error with average of averages
80%|████████ | 4/5 [01:12<00:18, 18.14s/it]
-> err = 12.598432122111321 +-1.7049395893844483
Error with sum of everything
80%|████████ | 4/5 [01:11<00:17, 17.77s/it]
-> err = 11.505614456176758 +-13.968025155156523
```

what is the difference? Is the covariance some how affecting things, if yes how?

I think I understand that I can just make the batch size super big instead of taking lots of averages but now I am just annoyed that I don’t understand the difference between these too. I don’t think there should be a difference and if there is a difference WHEN does it happen?

1 Asked on December 30, 2020 by saeed

hypothesis testing image processing machine learning ranking

1 Asked on December 29, 2020 by paul-m

censoring heteroscedasticity regression tobit regression weighted regression

1 Asked on December 29, 2020 by adam-kurkiewicz

0 Asked on December 28, 2020 by jbukoski

3 Asked on December 27, 2020

mathematical statistics measure theory quantiles random variable

0 Asked on December 27, 2020 by malek

confusion matrix precision recall python roc unbalanced classes

1 Asked on December 27, 2020 by matin-kh

2 Asked on December 27, 2020 by p34y2

0 Asked on December 27, 2020 by namelessgods

1 Asked on December 26, 2020

0 Asked on December 25, 2020 by slava-kohut

chi squared goodness of fit poisson distribution statistical test

1 Asked on December 24, 2020

1 Asked on December 24, 2020 by victor

0 Asked on December 24, 2020 by calveeen

0 Asked on December 24, 2020 by kacem-abd-el-aziz

conditional probability continuous data distributions measure theory probability

1 Asked on December 24, 2020 by natalia

0 Asked on December 23, 2020 by meilton

2 Asked on December 21, 2020

Get help from others!

Recent Answers

- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP