TransWikia.com

How to make use of POS tags as useful features for a NaiveBayesClassifier for sentiment analysis?

Data Science Asked on August 27, 2021

I’m doing sentiment analysis on a twitter dataset (problem link). I have extracted the POS tags from the tweets and created tfidf vectors from the POS tags and used them as a feature (got accuracy of 65%). But I think, we can achieve a lot more with POS tags since they help to distinguish how a word is being used within the scope of a phrase. The model I’m training is MultnomialNB().

The problem I’m trying to solve is to find the sentiments of tweets like positive, negative or neutral.

Structure of datset: enter image description here

Created pos tags: enter image description here

I created tfidf vectors from the tweet and gave the inputs to my model:

tfidf_vectorizer1 = TfidfVectorizer(
    max_features=5000, min_df=2, max_df=0.9, ngram_range=(1,2))
train_pos = tfidf_vectorizer1.fit_transform(train_data['pos'])
test_pos = tfidf_vectorizer1.transform(test_data['pos'])

clf = MultinomialNB(alpha=0.1).fit(train_pos, train_labels)
predicted = clf.predict(test_pos)

With the above code I got 65% accuracy. Rather than creating TF-IDF vectors of POS and using them as modal inputs. I’m wondering is there any other way that we can use POS tags to increase the accuracy of the model?

One Answer

There are so many ways you could go about this. For starters, you could use Conditional Random Fields (CRF). There is a sweet implementation in Python. In which you can set the POS features and more. There is a website from the same source you posted on how to use CRF for your purpose (I have not read it thoroughly). Spacy is another great resource to get all the features that you need fast. Nonetheless, for SOTA you will need some NN implementations.

Answered by 20roso on August 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP