TransWikia.com

Improve results using user input

Data Science Asked on June 13, 2021

I’ve developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding – a comparison is made between each expression from the database and the user input)

n-result are retrieved but the closest expressions are not necessarily the most relevant one.

For example, by typing : hospital machine

The top results will be "dialysis machine", "medical machine", … but I’ll also find expressions like "building machine", "office machine"

A user will most likely choose medicine related machine.

Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?

One Answer

Understanding similarity between two phrases has two aspects

  1. How similar are the unique tokens in the phrases ?
  2. How much should the individual tokens contribute to the overall phrase similarity?

To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.

Answered by Gyan Ranjan on June 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP