TransWikia.com

Text Mapping - Medicine Names

Data Science Asked by Kaushal Shah on July 14, 2021

We have a problem where we have a standardized database of Medicine names. On the other hand, there is a subset of medicine names which could have spelling mistakes, different structure or hypens, missing words etc. There is also some metadata available, like manufacturer name, unit size etc.

Human can easily map those two database with each other. We have used some string comparison and created some probabilistic scoring and in some cases it serves the purpose.

But lot of times we are running into lot of nuanced issues and conditions are keep getting piled up. Is there any idea if any machine learning type of algorithm can help? I have basic understanding of all major algorithms but yet I am drawing blank for this problem. Simple example is mapping Epilex 300mg tab with Epilex 300 Tablet. I can give more examples if needed.

One Answer

I think there is no need to use ML for this problem. We can solve this by a simple lookup table/map table. The thing is we have to update the lookup table whenever we encountered a new category.

To apply ML to this problem. You should have the data of a set of preprocessed, processed responses like [Epilex 300mg(preprocessed), Epilex 300(processed)]. If you have this data, you can apply the decision tree(without pruning) model to predict the processed text, given an unprocessed text. Remember you will get predictions with only the categories within the trained categories. What you can do is, if a new category has occurred you can manually add that response to train data. And train your data again. If next time that category is encountered it will predict correctly. Like this you can improve your model prediction power. After some time you can able to predict every response correctly. The screen shorts of sample code is attached here.

Sample Data

Solution

To have actual code consider this Link.

Answered by Venkatesh Gandi on July 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP