TransWikia.com

Mapping a set of corrupted strings to the correct ones

Data Science Asked by Algu607 on January 23, 2021

I am fairly new into Data Science but encoutered it before. The following problem troubles me and i hope you guys can point me in the right direction.

The input are some strings where some carry the same information others not. An unknow number of these strings are crooked* to a warrying degree. From only one letter off to complete garbage. On the output side are the corrected strings from the input. The catch is that there are only certain, already known, combinations of valid strings possible.

In a naive approach i chained some fuzzy searches and already got some promising results. Now i don’t know where to start or if there are similar problems already solved.

* (are we still allowed to say this?)

One Answer

That problem is called approximate string matching or fuzzy string searching. It has been well studied in computer science.

If there is a limited, known collection of valid strings (aka, a dictionary), then the problem can be framed as spell correction.

Answered by Brian Spiering on January 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP