TransWikia.com

Positive/negative training sample imbalance in multi-label image classifiers

Data Science Asked by trzy on September 4, 2021

I’m trying to train VGG-16 on the Pascal VOC 2012 dataset, which has images with 20 labels (and a given image can have multiple classes present). The examples are highly imbalanced, so I’ve "balanced" them such that each label is represented roughly equal in the training set.

But this means that for each label, 5% of the total images are positive examples and 95% are negative samples. There is no way to achieve a 50/50 split for all classes.

I’m using binary cross entropy loss and a sigmoid activation at the final VGG layer, since this is a multi-label problem. Binary accuracy looks great but in fact, the results for any given class are pretty dismal (~15% recall). The classifier is not fitting to positive examples and is biased toward reporting a negative result because that matches the data distribution (very few positive samples).

What is typically done in this scenario? The original paper appears to train on mutually-independent classes. Should I be using a custom loss function?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP