Computer Science Asked by 2.7182818284590452353602874713 on January 6, 2022

I’m interested in creating a graphs whose vertices are strings, and whose edges represent the relation of having an edit distance of 1 under a given string metric.

An obvious approach is to make all $frac{n^2-n}{2}$ comparisons among the $n$ vertices, giving us $O(n^2)$ time complexity.

Excluding parallelizing the comparisons, is there a better algorithm in terms of time complexity?

I’m interested in string metrics where strings of different length are allowed.

It's possible to use BK-Trees to speed this up. Inserting $n$ elements into the tree takes $O(n log n)$ time. After this you can query the tree for all strings whose edit distance is exactly one away from your input. Doing this for all strings again takes $O(n^2)$ complexity, however with a very small constant factor (this page mentions only 5-8% of the tree needs to be inspected per query).

Here's a short description of how it works:

A BK-Tree is either

- Empty
- A node with a root value $r$ and a set of indexed children $c_i$, each a BK-Tree (implemented using hash map, dynamic array, or similar)

A metric (important!) distance function $d(x,y)$ is used for insertion and queries.

- If the tree is empty, make it a new node with $s$ as the root value and no children
- If the tree is a node with root $r$ and children $c_i$, let $k=d(r,s)$.
- If $k=0$, skip insertion as $s$ is already at the root
- Otherwise recursively insert $s$ into the $k$-th child tree $c_k$.

The last step makes sure that all nodes in $c_i$ have distance $i$ to the root

- If the tree is empty, return no results
- If the tree is a node with root $r$ and children $c_i$, let $k=d(r,s)$.
- If $k=1$, add $r$ as a result (
**Note**: This step is different from usual BK-Tree queries) - In addition, recursively query $s$ from the children $c_{d-1}$, $c_d$ and $c_{d+1}$. Return all results from those queries as well

- If $k=1$, add $r$ as a result (

The last step there is the magic of BK-Trees, as it doesn't need to check the other children due to the distance being metric. Here's a partial proof of why others don't need to be checked:

Let's assume there was some string $x$ which has distance one from our query, so $d(s, x)=1$, but it's in the child tree $c_{k+2}$. We know that $d(r, x)=k+2$ from the insertion procedure. This however (with the triangle inequality for metric spaces) gives

$$ k+2=d(r, x)leq d(r, s)+d(s,x)=k+1 $$

But this is a contradiction! Similar can be done for all children with $i>k+1$ and $i<k-1$. This means that all strings in other children won't have distance one by construction, so we don't need to even check them, which saves processing time.

Edit: Another explanation: https://signal-to-noise.xyz/post/bk-tree/

Answered by Silvan Mosberger on January 6, 2022

In the worst case any such algorithm will work $Omega(n^2)$ because your graph can have $Omega(n^2)$ edges.

By the way, are you interested in some particular string metric?

Answered by Vladislav Bezhentsev on January 6, 2022

1 Asked on November 28, 2021 by user2207686

0 Asked on November 28, 2021

combinatorics formal languages regular languages strings word combinatorics

2 Asked on November 28, 2021

church turing thesis mu recursion turing completeness turing machines

0 Asked on November 25, 2021

2 Asked on November 25, 2021 by hish

finite automata pumping lemma regular expressions regular languages

5 Asked on November 23, 2021 by stephen-mwangi

1 Asked on November 23, 2021 by judy-l

3 Asked on November 21, 2021

functional programming higher order logic lambda calculus polymorphisms programming languages

0 Asked on November 21, 2021 by user3862410

algorithm analysis algorithms coding theory order theory ordering

1 Asked on November 21, 2021

2 Asked on November 17, 2021 by ocram

1 Asked on November 17, 2021 by user123521

1 Asked on November 17, 2021 by dshri

0 Asked on November 13, 2021 by northerner

1 Asked on November 11, 2021 by pragya

1 Asked on November 11, 2021 by yuezato

1 Asked on November 8, 2021 by john-flemin

2 Asked on November 5, 2021 by vinay-varahabhotla

Get help from others!

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

Recent Answers

- Joshua Engel on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- haakon.io on Why fry rice before boiling?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP