TransWikia.com

Text topic classification in tensorflow

Data Science Asked on December 5, 2020

I want to create a CNN in tensorflow that does the following:
Classify a recipe headline and find out the topic. For instance super yummy cheesy cake should result in cheese cake and so on.

I thought for going with tensorflow, but need some help in getting things started.

My strategy is like that:

  1. Normalize the headlines so cheesy becomes cheese and cheesecake becomes cheese cake for instance and so on.
  2. Having a dataset like:

    • super yummy cheesecake | cheese cake
    • summer strawberry cake | strawberry cake
  3. Train the model to learn what matters for the topic and what is just additional information.

The way, the dataset is modeled, I have no static lables, as I understand. This makes things complicated, right?

As this is my first AI experiment with tensorflow, I don’t really know if this will work out, or if I should go with another strategy, therefore I need your help.

2 Answers

To me it looks like not tensorflow task at all. At least not at the first place.

  1. "Normalize headlines" task (lemmatization). Spacy does nice job here and it has great documentation. Here is an example, have a look at the "lemma" property.

  2. Use the food2vec as a database of topic names.

  3. Parse sentence via spacy and find the phrase in the food2vec. Parsing should be done not word-by-word, but by phrase: first look up 3-words phrase in the dictionary; if not found - 2-word dictionary; than 1 word.

This should be enough to solve your task.

Answered by Vlad-HC on December 5, 2020

You can frame this as a sequence to sequence prediction model similar to translation and summarization. This neural translation with attention colab is probably a really good place to start.

Answered by Alexandre Passos on December 5, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP