TransWikia.com

mBED has time complexity $O(n lg n)$, claimed by Clustal Omega paper, why?

Bioinformatics Asked on December 7, 2021

In the Clustal Omega paper, it says the following about mBED ($N$ is the number of input sequences, I assume):

we use a modified version of mBed (Blackshields et al , 2010), which has complexity of $O(N log N)$

But in the mBED paper it says the following (section 2):

A number of $t$ sequences are initially sampled from the input dataset $X .$ Following the LLR
algorithm, this value is set by default to $t=left(log _{2} Nright)^{2} .$

After the seed sequences in $R$ have been chosen, all sequences in the input data are associated with a $t$-dimensional vector. This is done simply by computing the distances from all sequences to each of the $t$ seeds.

Then my conclusion is that this would lead to a complexity of $O(N (log N)^2)$ because for each sequence (in total $N$ of them), distances to $t = (log N)^2$ sequences must be calculated. Where did I do wrong?

One Answer

Turns out that I was dumb: I misread.

The Clustal Omega paper (in the supplementary materials) chose $t = log N$, which I think originates from this claim from this part of the Linial et al. paper:

In random polynomial time $X$ may be embedded in $l_{2}^{O(log n)}$ with distortion $leq(1+epsilon) cdot c_{2}(X)$ (for any $epsilon>0)$

Several questions still remain though:

  1. The newest implementation of Clustal Omega (that I downloaded) has source code that instead chooses $t = (log N)^2$, why changed?
  2. The original mBED paper chooses $t = (log N)^2$, but it was not justified on how they chose this number instead of $log N$

Answered by runeblaze on December 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP