What is the state of the art solution for text classification for large corpora

Artificial Intelligence Asked by Nick on August 22, 2020

I have a need to classify documents in a set of documents, which grows over time from a small tagged training set. The classification is a binary classification. Training on the tagged set produces good results, but as the set increases in size as does the size of the vocabulary the model degrades. The model is a Boosted NaiveBayes applied to a tfidf representation of the text. Each document is a reasonably sized news report.

What is the state of the art solution for such a problem? I was thinking a semi-supervised approach might be a way forward, tagging new data using the previous model to create a new model but it doesn’t seem successful.

Add your own answers!

Related Questions

Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar


Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP