TransWikia.com

Classifying whether a comment or review is a complaint or appreciation of product and extracting the Topic?

Data Science Asked by faizan on March 4, 2021

I need to classify whether a given review or comment is a complaint or appreciation. This is planned to be used in multiple places, product review pages of own site as well as facebook and twitter. Suggestions on how to approach please.

The Problems that are confusing me:

  1. In FB/Twitter I don’t know which product it is for, I need to extract that from text as well.
  2. I need to extract the complaint/appreciation part and group similar ones together, (like good color reproduction and great clarity into just good display)
  3. Articles(each document) will be differently sized.
  4. Data availability is none, I will prepare data by going through our FB etc.

My initial thought was LSTM based classification, but point 3,4 make that hard. Even with 3,4 solved. How do I go about 1,2. I only have played with word2vec a bit and done some twitter sentiment analysis dummy projects. Point 1,2 seem Information Retrieval, Need pointers for that.

One Answer

Your (basic) task is sentiment analysis, covered in many places.

There is a number of algorithms proven good for that, including LSTM but you need a good deal of data to train that (and compute power). Others is fastText - tool by Facebook - where you have already embeddings for a number of languages including EN.

But you need anyway 'ground truth' - samples of positive and negative posts / reviews - to train final classification model. Assuming you have standard EN reviews - meaning a lot of language typical to online reviews... - you can use some of existing datasets to train your model and boost results. Otherwise you would have to manually select representative (meaning the more the better...) samples and label them (pos/neg). E.g. there is yelp review dataset, actually a number of versions, one of them.

The other tasks - extracting product and complaint/appreciation part - are entity extraction and topic modelling respectively, each a seprate story.

For entity (product) extraction, possibly you can just search the post for the names from your inventory. Assuming again EN language, beware only of plural versions. And you will need a good data structure / indexes for it to get results fast.

For topic modelling, there is LDA implementation in gensim. You will have to manually go through the topics and process them, depending on the exact scenario you want to use them, e.g. assign each a meaningful name, selecting meanwhile those that make sense for you, merging some.

Correct answer by MkL on March 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP