A new paper suggests the Corona Virus has "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1" - What does this mean?

Bioinformatics Asked by SurpriseDog on August 27, 2021


We found 4 insertions in the spike glycoprotein (S) which are unique to the 2019-nCoV and are not present in other coronaviruses. Importantly, amino acid residues in all the 4 inserts have identity or similarity to those in the HIV-1 gp120 or HIV-1 Gag. Interestingly, despite the inserts being discontinuous on the primary amino acid sequence, 3D-modelling of the 2019-nCoV suggests that they converge to constitute the receptor binding site. The finding of 4 unique inserts in the 2019-nCoV, all of which have identity/similarity to amino acid residues in key structural proteins of HIV-1 is unlikely to be fortuitous in nature.


enter image description here

I know this is a preprint paper and it’s not yet peer reviewed, but can someone tell me what the implications are if this is true? Does this mean that the virus is artificially constructed?

3 Answers

UPDATE: The article has now been withdrawn with the following note:

This paper has been withdrawn by its authors. They intend to revise it in response to comments received from the research community on their technical approach and their interpretation of the results. If you have any questions, please contact the corresponding author.

This is very odd, and will require a rigorous investigation, but my initial reaction is one of scepticism. Considering just the 1st insert, the insert sequence is GTNGTKR, short at just 7 amino acids. A simple BLASTP vs NR did not find perfect matches to HIV sequences, but did reveal 100% identity across the full sequence length to >50 other short protein sequences, which could be spurious 'chance' hits of course. Many of the top 100 hits are against eukaryotic sequences, for example one is against Pristionchus pacificus, a type of nematode worm (see alignment below). Someone needs to do a proper peer review of this preprint before any conclusions are drawn from it.

>tank-1 [Pristionchus pacificus]
Sequence ID: PDM74036.1 Length: 2481 
Range 1: 1474 to 1480

Score:24.0 bits(49), Expect:2477, 
Identities:7/7(100%), Positives:7/7(100%), Gaps:0/7(0%)

Query  1     GTNGTKR  7
Sbjct  1474  GTNGTKR  1480

UPDATE 1: Things are moving fast. There are now 10 comments under the preprint, all of them critical of the idea that these inserts are meaningfully similar to HIV-1 sequences for the reason I outlined above (i.e. short sequences with many hits against organisms from across the tree of life) and additional points, such as small insertions being quite normal evolution for RNA viruses. There are also similar critiques on twitter and biorxiv has added the following disclaimer header to their website:

bioRxiv is receiving many new papers on coronavirus 2019-nCoV. A reminder: these are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information.

I have never seen such rapid (and unanimous) post-publication peer review before. It is therefore already (just 1 day after publication) fairly clear that at least some claims in this study are completely false and there is no good evidence to support the claim that the 2019-nCoV have acquired sequences from HIV or evidence to suggest the virus is engineered.

Correct answer by Chris_Rands on August 27, 2021

Normally "inserts" used in the manuscript are "indels" in protein alignments, short for insertions and deletions.

What I think has happened is a group investigating indels in HIV env noticed indels in 2019-nCov. Essentially I think the correlation is spurious - but I haven't test it, but the area of research in understanding indels is certainly valid and important.

What is certain is that indels induce a large structural change to a protein structure and any Gibbs free-energy style calculation will identify this.

Vaccine The spike protein will be the primary candidate to make a 2019-nCov vaccine and that is a very important reason why the sequence was rapidly released. So it is an important protein and the structural changes indels induce mean that a SARS vaccine will probably not provide much protection against 2019-Cov, even apart from the amino acid divergence (below).

Differences 2019-nCov vs HIV In summary, alot. HIV env and particularly HIV gag are very different from coronaviruses, both in the mechanism of genome replication, coronavirus never leaves the cytoplasm, clinical outcomes, tissue tropism and duration of infection.

Similarities HIV env and the glycoprotein spike of coronaviruses are the receptor binding protein to gain entry into a cell. They are called structural proteins. Entry to a cell can be blocked by antibodies and these antibodies are called "neutralizing antibodies". Neutralizing antibodies are catastrophic for a virus. Other antibody responses can be effective, such as IgM, but to clear an infection just using antibodies, you need neutralising antibodies. Both HIV env and the coronavirus spike protein are subject to neutralising antibodies. HIV gag has nothing to do with HIV env, in terms of function or antibody exposure. This is why the spike protein will be the primary vaccine candidate for a subunit vaccine.

Coincidence, law of chance There is large variation of indels in HIV env within HIV and what the authors are inferring is there is a resemblence to that between SARS and 2019-nCov. In my opinion this is a coincidence, because they are comparing a large repertoire of HIV varients against a single indel pattern in the coronaviruses.

Why coronavirus indels?

That is a very good question. Generically indels in viral surface antigen genes are common, much more common in other proteins - such as those involved in virus replication (non-structural proteins). The amino acid identity between SARS and 2019-nCov is 80%, and in any virus, such as flaviviruses 80% identity means indels will be present in surface antigens between the viruses. The answer is it is not unusual in any RNA virus to see indels at a comparatively large amino acid divergence.

What function could they serve

I've briefly looked at indel bioinformatics between flaviviruses (Zika virus, yellow fever virus etc..) notably using envelope (E) protein sequences, and they also occur between African Zika viruses in the E-protein. E-protein being the equivalent of coronavirus spike protein, the receptor-binding protein. No-one has ascribed a function to them and that is the problem with this manuscript.


  • One theory is that a structural change in the protein will occur to stop antibody binding.
  • Another theory is they have functional differences, such as cell tropism

Bioinformatically separating the two theories is extremely hard without wetlab experimentation.

Answered by M__ on August 27, 2021

SurpriseDog - I suspect the wording "unlikely to be fortuitous in nature" is what led you to ask "Does this mean the virus is artificially constructed?" - No!

I think the writing here is imprecise and confusing. There is no implication that these are human-generated (non-natural/engineered) insertions but rather I think they are implying that these mutations are likely to have functional implications. Whether there is any relation to HIV is questionable.

There has been some political finger pointing and threats of full-scale investigation into whether the virus escaped from a lab. Hard to say, but current thinking is that SARS-Cov-2 is a natural isolate, not engineered (see It was probably transmitted from bats.

Answered by neonglow on August 27, 2021

Add your own answers!

Related Questions

SLURM script for running RSEM star fails

0  Asked on April 24, 2021 by angelo


Genomic relationship matrix explanation

0  Asked on April 21, 2021


Fastq: how can I check if they are from DNA or RNAseq data?

2  Asked on April 15, 2021 by emma-athan


ATAC seq density calculation

2  Asked on April 15, 2021


plink: –update-name vs. editing the BIM

2  Asked on April 11, 2021 by coderguy123


About getting rs id from chromosome and position

0  Asked on April 10, 2021 by susuauidikd


How to convert a Pileup file to VCF format with Hg19 alignment

1  Asked on April 9, 2021 by samir-bouftass


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir