TransWikia.com

K-means and LDA for text classification

Data Science Asked on March 9, 2021

I hope to explain in a clear way what I would like to do. I have more than 50000 tweets and I would like to add some labels on topics. So I have used LDA for doing this. I have also used k-means to group them and try to predict the cluster (but not the topic). I would like to know if it is possible to associate to clusters the topics found from LDA or if the two approaches are worthless when used simultaneously.

One Answer

Try to look at it from a broader perspective, what you're essentially doing is wanting to build a sort of multi-class classification model based on your clusters and the topics LDA assigned.

It's like wanting a clustering model and a classification model to do the same thing, which would ultimately depend upon the dimensional space the data points are in and their respective vectors.

I assume you must be using some sort of tweet to vector conversion method to represent a tweet, like bag of words or glove doc2vec or something. Now making clusters on this vector space and hoping this does the same segregation as LDA does to allot topic is quite farfetched. I don't think so, your vector space will be so much in line with the topics LDA produces for your clusters to reflect the same. Also clustering puts data points in same clusters based on certain attributes, but determining why a cluster is made based on these attributes if the data dimensions are is very difficult.

Answered by Rishabh Sharma on March 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP