Should augmentation also be performed on the validation set when the dataset is imbalanced?

Question

I am training a CNN on images (2 classes) and I have an imbalanced dataset (1:7 ratio). I am trying to tackle this by performing offline image augmentation. Should I perform augmentation also on the validation set or is it ok if the validation set remains imbalanced?

Rishabh Sharma · Answer

You have the idea of augmentation wrong I suppose. Image augmentation is used to introduce variations in your existing image dataset by using different operations like rotation, slicing, mirroring etc to make the model more robust. But using image augmentation on unbalanced data would keep the resultant data unbalanced as long as all the operations are performed on the entire dataset. So the question of performing augmentation on just train or both train and validation doesn't make much sense if you look at it the above way.
I would recommend you to opt for image augmentation only if you feel you have less examples to train from and also you want to make your model more robust but do it on your entire dataset.
You can oversample your minority class as well if you feel it's suffering during training or recall is less.
To oversample your minority class, you can perform image augmentation on your minority class only.

David Masip · Answer

Referring to a previous question, there is no reason to tackle imbalance unless your model is not learning properly with the imbalanced dataset. Besides, 1:7 is not that big of an imbalance.

Should augmentation also be performed on the validation set when the dataset is imbalanced?

2 Answers

Add your own answers!

Ask a Question