TransWikia.com

SMOTE for multilabel classification

Data Science Asked on October 6, 2020

I have a dataset with 77 different labels. Each sample has one or more of these labels.

I did some data analysis and found out that the dataset is highly imbalanced – there are a large number of examples that have a particular label, whereas the other labels don’t occur so frequently across the data samples.

I’m trying to use SMOTE to synthesize new data samples for the minority labels but apparently, imblearn’s SMOTE doesn’t support multi-label data. Is there an alternative to SMOTE that I can use for multilabel classification, or should I treat my problem as 77 different binary classification problems, and apply SMOTE on each iteration separately?

One Answer

If you have only one example for certain classes SMOTE won't work. Most of the Machine Learning algorithms won't work either.

There is a technique called One Shot Learning (it is normally used in computer vision) that "Whereas most machine learning-based object categorization algorithms require training on hundreds or thousands of samples/images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training samples/images."

Maybe you could try with one OSL to help you with the classification but normal ML algorithms require more samples to be able to generalize.

Correct answer by Carlos Mougan on October 6, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP