TransWikia.com

Should augmentation also be performed on the validation set when the dataset is imbalanced?

Data Science Asked by Denisa Ionascu on July 14, 2021

I am training a CNN on images (2 classes) and I have an imbalanced dataset (1:7 ratio). I am trying to tackle this by performing offline image augmentation. Should I perform augmentation also on the validation set or is it ok if the validation set remains imbalanced?

2 Answers

You have the idea of augmentation wrong I suppose. Image augmentation is used to introduce variations in your existing image dataset by using different operations like rotation, slicing, mirroring etc to make the model more robust. But using image augmentation on unbalanced data would keep the resultant data unbalanced as long as all the operations are performed on the entire dataset. So the question of performing augmentation on just train or both train and validation doesn't make much sense if you look at it the above way.

I would recommend you to opt for image augmentation only if you feel you have less examples to train from and also you want to make your model more robust but do it on your entire dataset.

You can oversample your minority class as well if you feel it's suffering during training or recall is less.

To oversample your minority class, you can perform image augmentation on your minority class only.

Answered by Rishabh Sharma on July 14, 2021

Referring to a previous question, there is no reason to tackle imbalance unless your model is not learning properly with the imbalanced dataset. Besides, 1:7 is not that big of an imbalance.

Answered by David Masip on July 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP