TransWikia.com

Is it better to have one model with more categories or less with two for multi-label classification?

Data Science Asked by Nouf on May 12, 2021

For classifying text into three classes question, complain and complements
where each sample can have multi-labels (question and complains, question and complements):

  • is it better to have one model for all three targets?
  • or two models, the first for (question or not) and the second one for (complains, complements or else)?

which approach is better when the data are labeled, unlabeled and unbalanced?

One Answer

Generally there are two main options for multi-label classifications:

  • A binary model for every class, in this case the labels are supposed to be independent of each other.
  • A joint model which learns all the classes together, so at training stage the class represents the three values for the labels, for instance:
    • class "NNN" means no label at all
    • class "NNY" means no for question, no for complain and yes for complements
    • class "NYN" means no for question, yes for complain and no for complements
    • ...
    • class "YYY" means yes for question, yes for complain and yes for complements

In theory there are $2^3$ possible classes for 3 labels, but in practice there might be less combinations in the data (as suggested in the question). If the labels depend on each other (for instance if it's unlikely to have a document which is both complain and complements), a joint model is more appropriate. However a joint model might need more instances to be trained properly, since it has more work to do than a binary model.

Answered by Erwan on May 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP