Is there any package in python that can identify similarity between alphanumeric alias names of a parameter?

Question

For example: for a parameter like input voltage,
     Alias names : V_INPUT, VIN etc.

Now, I want the software to recognize each of the alias names as same. Is there any package/method by which I can achieve this?
Nltk is only allowing for dictionary words.

n1k31t4 · Answer

If you know there are only specific variants, you can obviously make a look-up table yourself (i.e. a Python dictionary).

Otherwise you could try using a fuzzy matching library, like fuzzywuzzy.

This will give you a "closeness" score for your search term, based on your list of parameters (measurements). Here is an example of how you could use it:

In [1]: from fuzzywuzzy import process

In [2]: measurements = ["Voltage", "Current", "Resistance", "Power"]

In [3]: variants = ["VOLT", "voltage_in", "resistnce", "pwr", "amps"] # notice typos etc.

In [4]: for variant in variants:
   ...:     results = process.extract(variant, measurements, limit=2)
   ...:     print(f"{variant:<11} -> {results}")  # See which two were found to be closest 
   ...:     best = results[0]                     # Take the best match by score (first in the list)
   ...:     if best[1] < 70:                      # Set a threshold at 70%
   ...:         print(f"Rejected best match for '{variant}': {best}")

VOLT        -> [('Voltage', 90), ('Current', 22)]
voltage_in  -> [('Voltage', 82), ('Resistance', 30)]
resistnce   -> [('Resistance', 95), ('Current', 38)]
pwr         -> [('Power', 75), ('Current', 30)]
amps        -> [('Voltage', 26), ('Resistance', 22)]
Rejected best match for 'amps': ('Voltage', 26)

So most worked out pretty well, including the typo example.

Obviously this does not kind of semantic search, as so amps do not get related to Current in any way.

To go the way of semantic encodings, you might want to look into "word embeddings", which do indeed try to match the real meaning of words, based on their semantic meaning. To start here, you could look into Word2Vec or GloVe` embeddings. Perhaps there is even a tool or library that already offers this capability.

These approaches will not inherently deal with things like typos, so for best results, you could even combine the two approaches.

Dummy Scripts · Answer

Yes, there are a couple.  My favorite is PyDictionary
PyDictionary
Or if you’re using pip make sure you’re up to date and in terminal execute this command:
pip install PyDictionary Hope this helped

Is there any package in python that can identify similarity between alphanumeric alias names of a parameter?

2 Answers

Add your own answers!

Ask a Question