# mBED has time complexity $O(n lg n)$, claimed by Clustal Omega paper, why?

Bioinformatics Asked by runeblaze on August 27, 2021

In the Clustal Omega paper, it says the following about mBED ($$N$$ is the number of input sequences, I assume):

we use a modified version of mBed (Blackshields et al , 2010), which has complexity of $$O(N log N)$$

But in the mBED paper it says the following (section 2):

A number of $$t$$ sequences are initially sampled from the input dataset $$X .$$ Following the LLR
algorithm, this value is set by default to $$t=left(log _{2} Nright)^{2} .$$

After the seed sequences in $$R$$ have been chosen, all sequences in the input data are associated with a $$t$$-dimensional vector. This is done simply by computing the distances from all sequences to each of the $$t$$ seeds.

Then my conclusion is that this would lead to a complexity of $$O(N (log N)^2)$$ because for each sequence (in total $$N$$ of them), distances to $$t = (log N)^2$$ sequences must be calculated. Where did I do wrong?

Turns out that I was dumb: I misread.

The Clustal Omega paper (in the supplementary materials) chose $$t = log N$$, which I think originates from this claim from this part of the Linial et al. paper:

In random polynomial time $$X$$ may be embedded in $$l_{2}^{O(log n)}$$ with distortion $$leq(1+epsilon) cdot c_{2}(X)$$ (for any $$epsilon>0)$$

Several questions still remain though:

1. The newest implementation of Clustal Omega (that I downloaded) has source code that instead chooses $$t = (log N)^2$$, why changed?
2. The original mBED paper chooses $$t = (log N)^2$$, but it was not justified on how they chose this number instead of $$log N$$

Answered by runeblaze on August 27, 2021

## Related Questions

### install bowtie2 from sources cannot find -ltbb

1  Asked on June 25, 2021 by suvar

### Normalize RNA seq data from multiple runs for expression analysis

5  Asked on June 25, 2021

### Tool for rna/lna melting temperature prediction

1  Asked on June 24, 2021

### Getting Unique Identifier List for GEO Datasets NCBI

2  Asked on June 24, 2021 by pawan-verma

### MSA (protein) with biopython or something else?

1  Asked on June 24, 2021 by curioustree

### Problem with merge data while trying to convert gene names in R

2  Asked on June 23, 2021 by equinox

### How to deal with the mismatches between gene names obtained from different sources?

0  Asked on June 22, 2021

### Is there any value in scaffolding the output contigs of MEGAHIT assembler given a metagenomic dataset?

2  Asked on June 21, 2021

### How to tell if our ligand-protein docking is good from AutoDock Vina’s result

1  Asked on June 20, 2021 by scamander

### Survival analysis using CoxPH – Effect of covariates

1  Asked on June 17, 2021 by beerzy

### Parsing .vcf file for this information

2  Asked on June 17, 2021

### Is there an efficient way to extract CIGAR strings for read pairs from bam files with python?

1  Asked on June 16, 2021 by mereven

### Filtering genes from cuffdiff results

1  Asked on June 13, 2021 by sujaypatil

### Low Fraction of usable antibody reads in CiteSeq

1  Asked on June 12, 2021 by gypti

### RNA_Seq Analysis in R, propmapped function issue

1  Asked on June 11, 2021 by pa_lvl

### Selecting part of an extracted ligand

1  Asked on June 11, 2021 by user8338

### Proper use of BWA MEM on multiplexed GBS sample

2  Asked on June 10, 2021 by plantgeek519

### co-occurrence analysis and visualization for amplicon microbial data

0  Asked on June 9, 2021

### How do I create a VCF file of all known pathogenic mutations in a gene of interest?

1  Asked on June 8, 2021 by nereus

### How do I write tests for a snakemake pipeline?

2  Asked on June 8, 2021