# Is there a better-than-brute-force algorithm to generate a graph whose relation is string edit distance=1?

Computer Science Asked by 2.7182818284590452353602874713 on January 6, 2022

I’m interested in creating a graphs whose vertices are strings, and whose edges represent the relation of having an edit distance of 1 under a given string metric.

An obvious approach is to make all $$frac{n^2-n}{2}$$ comparisons among the $$n$$ vertices, giving us $$O(n^2)$$ time complexity.

Excluding parallelizing the comparisons, is there a better algorithm in terms of time complexity?

I’m interested in string metrics where strings of different length are allowed.

## 2 Answers

It's possible to use BK-Trees to speed this up. Inserting $$n$$ elements into the tree takes $$O(n log n)$$ time. After this you can query the tree for all strings whose edit distance is exactly one away from your input. Doing this for all strings again takes $$O(n^2)$$ complexity, however with a very small constant factor (this page mentions only 5-8% of the tree needs to be inspected per query).

Here's a short description of how it works:

### Data structure

A BK-Tree is either

• Empty
• A node with a root value $$r$$ and a set of indexed children $$c_i$$, each a BK-Tree (implemented using hash map, dynamic array, or similar)

A metric (important!) distance function $$d(x,y)$$ is used for insertion and queries.

### Inserting string $$s$$

• If the tree is empty, make it a new node with $$s$$ as the root value and no children
• If the tree is a node with root $$r$$ and children $$c_i$$, let $$k=d(r,s)$$.
• If $$k=0$$, skip insertion as $$s$$ is already at the root
• Otherwise recursively insert $$s$$ into the $$k$$-th child tree $$c_k$$.

The last step makes sure that all nodes in $$c_i$$ have distance $$i$$ to the root

### Querying string $$s$$

• If the tree is empty, return no results
• If the tree is a node with root $$r$$ and children $$c_i$$, let $$k=d(r,s)$$.
• If $$k=1$$, add $$r$$ as a result (Note: This step is different from usual BK-Tree queries)
• In addition, recursively query $$s$$ from the children $$c_{d-1}$$, $$c_d$$ and $$c_{d+1}$$. Return all results from those queries as well

The last step there is the magic of BK-Trees, as it doesn't need to check the other children due to the distance being metric. Here's a partial proof of why others don't need to be checked:

Let's assume there was some string $$x$$ which has distance one from our query, so $$d(s, x)=1$$, but it's in the child tree $$c_{k+2}$$. We know that $$d(r, x)=k+2$$ from the insertion procedure. This however (with the triangle inequality for metric spaces) gives

$$k+2=d(r, x)leq d(r, s)+d(s,x)=k+1$$

But this is a contradiction! Similar can be done for all children with $$i>k+1$$ and $$i. This means that all strings in other children won't have distance one by construction, so we don't need to even check them, which saves processing time.

Edit: Another explanation: https://signal-to-noise.xyz/post/bk-tree/

Answered by Silvan Mosberger on January 6, 2022

In the worst case any such algorithm will work $$Omega(n^2)$$ because your graph can have $$Omega(n^2)$$ edges.

By the way, are you interested in some particular string metric?

Answered by Vladislav Bezhentsev on January 6, 2022

## Related Questions

### Solving the Knapsack problem in $O(n^2P)$, where P is the maximum weight of all items

1  Asked on November 28, 2021 by user2207686

### Number of words of length n for special language

0  Asked on November 28, 2021

### Intuition for Church-Turing thesis for Turing machines

2  Asked on November 28, 2021

### Global-input-local-output p-time algorithms

0  Asked on November 25, 2021

### Does a regular expression exist for any number that contains no more than two 5s and no 6 twice in a row?

2  Asked on November 25, 2021 by hish

### Is $EVEN-SAT$ $NP$-hard?

2  Asked on November 23, 2021 by zur-luria

### Proof of the undecidability of compiler code optimization

5  Asked on November 23, 2021 by stephen-mwangi

### Counting circuits with constraints

1  Asked on November 23, 2021 by judy-l

### Uncurrying and Polymorphism

3  Asked on November 21, 2021

### Algorithm suggestion to order data with specific condition

0  Asked on November 21, 2021 by user3862410

### If $A$ is context-free then $A^*$ is regular

1  Asked on November 21, 2021

### Which is the best approach to solve Turing machines exercises?

2  Asked on November 17, 2021 by ocram

### What is the reason for writing parallel programs?

1  Asked on November 17, 2021 by user123521

### Are all Recursively Enumerable languages which are not Recursive also Undecidable?

1  Asked on November 17, 2021 by dshri

### Is English Turing-complete?

3  Asked on November 15, 2021

### If factor isn’t found in P-1 algorithm, should upper bound be increased linearly (i.e. +1)

0  Asked on November 13, 2021 by northerner

### Can map-reduce speed up the count-min-sketch algorithm?

1  Asked on November 11, 2021 by pragya

### Maximization problem on finite collection of finite sets

1  Asked on November 11, 2021 by yuezato

### Do there exist fast multiplication algorithms for two integers with one of them being static?

1  Asked on November 8, 2021 by john-flemin

### Convert the given NFA to DFA

2  Asked on November 5, 2021 by vinay-varahabhotla

### Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP