TransWikia.com

Setting BATCH SIZE when performing multi-class classification with imbalanced dataset

Data Science Asked on May 6, 2021

I have a question regarding BATCH_SIZE on multi-class classification task with imbalanced data. I have 5 classes and a small dataset of around 5000 examples. I have watched G. Hinton’s lectures on Deep Learning and he states that each mini batch should ideally be balanced (meaning each batch should contain approximately same number of data points for each class). This can be approximated by shuffling data and then drawing random batch from it.

But, in my mind this will only work if we have some what large and BALANCED dataset. In my case, I think that setting BATCH_SIZE to be >=16 it might have a bad impact on learning, and network will not be able to generalize. Is it better to maybe use SGD and update weights after each sample has been processed (i.e. online training)? P.S. have in in mind that I am using label smoothing (i.e. class weighted loss)

One Answer

There are two common options:

  1. Stratified sampling within each batch. Regardless of batch size, make sure each group is equally represented. The downside this approach is that it would significantly slow down training.

  2. Train with increased batch size (say 32-256) and over the course of the epochs the random fluctuations will "average" out.

Answered by Brian Spiering on May 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP