TransWikia.com

What is the best way to compare these small distributions?

Data Science Asked by Dieshe on January 7, 2021

I have one distribution of size 30.

This are results (ROC-AUC for example) from training a neural network for 30 times in a row with the same hyperparameters but since they are randomly initialized the result is always a little bit different.

Then I train the same network with other hyperparameters and only want to do that for fewer runs. Lets say for 5 runs.

My null hypothesis is that the smaller runs distribution is not smaller than the distribution with 30 runs (one side test).

What kind of statistical significance test would be the best to compare these small distributions?

PS: At the moment I am using Mann Whitney U Test. Is there anything better?

One Answer

To assess the effects of hyperparameters on the neural network results, you need to eliminate confounding variables. Let's say you have network $A_t$ and network $B_t$ corresponding to trial/sample $t$. The randomly initialized weights of $A_t$ should equal $B_t$. If you are using a stochastic optimizer (e.g. SGD), you need to ensure that the random instance selection is the same between training network $A_t$ and $B_t$.

Once you've eliminated confounding variables, then you effectively have paired samples, which you can compare via a Wilcoxon signed-rank test. For small sample sizes, it can be preferable over a paired t-test because 1. you can't verify that the samples are normally distributed and 2. the Wilcoxon test effectively evaluates median rather than mean. The median is more robust against outliers, which are particularly influential if you only have a few samples.

Answered by Benji Albert on January 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP