# Are the conclusions in "The proximal origin of SARS-CoV-2" legit?

Bioinformatics Asked on September 2, 2021

I don’t have any background in genetics and bioinformatics, so I ask you if you think that the arguments provided in the article The proximal origin of SARS-CoV-2 by Andersen et al. are convincing. In particular:

While the analyses above suggest that SARS-CoV-2 may bind human ACE2 with high affinity, computational analyses predict that the interaction is not ideal and that the RBD sequence is different from those shown in SARS-CoV to be optimal for receptor binding. Thus, the high-affinity binding of the SARS-CoV-2 spike protein to human ACE2 is most likely the result of natural selection…

I interpret this passage as “if the virus had been engineered, they would have done a better job”. Is this interpretation correct? And if it is, does it sound convincing to you as a proof that the virus developed in nature?

The second motivation given in the article is the following:

Furthermore, if genetic manipulation had been performed, one of the several reverse-genetic systems available for betacoronaviruses would probably have been used

Again, do you think that this exclude the possibility of a human intervention in the creation of the virus?

Edit: I am a fan of Occam razor, I know that the scenario in which the virus originated in the wild is by far more likely than the human engineered scenario. I just want to know if, given our current knowledge in genetics, would have been possible for some high skilled researchers to engineered COVID19 (without implying that they did this with any bad purpose)?

Edit 2: I share the concerns of the author of this post, that is, that the findings of the article are merely opinions and some of arguments are misleading. Anyone having a solid technical background can comment on this? China owns Nature magazine’s ass – Debunking “The proximal origin of SARS-CoV-2” claiming COVID-19 definitely wasn’t from a lab

In summary, the authors are saying the complete opposite of "human intervention".

While the analyses above suggest that SARS-CoV-2 may bind human ACE2 with high affinity, computational analyses predict that the interaction is not ideal and that the RBD sequence is different from those shown in SARS-CoV to be optimal for receptor binding.

The interaction with ACE2 is taking place on the Spike protein (S) in both COVID-19 and SARS-Cov. The S protein is very different between COVID-19 and SARS both in terms of sequence and COVID-19 includes an additional furin cleavage site, meaning structurally it is quite distinct. The "computational analysis predicts" is difficult because there is no crystal structure for COVID-19 S protein therefore modelling ligand binding is essentially guess work, particularly as it comprises an additional (furin) cleaved domain. What the authors appear to conclude is that they cannot map ACE2-Spike binding of COVID-19 using an established SARS S protein structure. I assume they are homology modelling SARS-CoV onto COVID-19 S protein sequence data. In my personal opinion there is no way to homology model an additional furin protease cleavage site (furin is a host protease) so it would be difficult for "good fit" to occur, but it is just my opinion and the authors may have established methods to overcome this notable limitation, e.g. similar phenomena occur in influenza.

Thus, the high-affinity binding of the SARS-CoV-2 spike protein to human ACE2 is most likely the result of natural selection..

Their conclusion of "natural selection" is not strictly accurate and should be "Darwinian positive selection", but its a small issue. What they are saying is Darwinian adaptation has occurred as opposed to "purifying selection, i.e. conservation. Thus they are definitely excluding human intervention because that is not "natural selection"

Furthermore, if genetic manipulation had been performed, one of the several reverse-genetic systems available for betacoronaviruses would probably have been used

They appear to be saying that if a synthetic virus was constructed it would be created using an established reverse genetics system, and all those produced to date look nothing like COVID-19.

The phylogenetics argument is always a stronger argument than either of the above (although my opinion isn't entirely objective). Phylogeneticists have had a long history in opposing conspiracy theories, e.g. polio vaccination, HIV and the list goes on, so it is much more natural territory. An example of such an argument is here which @terdon helpfully pointed out.

To address the questions of @Hans

Question 1

• Purifying selection = any evolutionary change is deleterious, so the virus is less able to transmit between humans. The vast majority of mutations are deleterious in classical Darwinian thinking*
• Adaptation = the amino acid change occurs to ensure the virus is better able to transmit between humans. In phylogenetics this is called 'positive selection' which is detected at a nucleotide level, this is however a very stringest test.

*, There is something called nearly neutral theory, but thats just complicated

Question 2

What we are talking about is a virus infectious clone system. For a 30kb virus this would not be trival at all, but I agree for 10kb viruses it much easier. However, the phylogenetics are opposed to this, because SARS-CoV-2 is 5% divergent from a virus (RaTG13) isolated a long time before the current epidemic. You can't just engineer a virus that is 5% divergent from a bat virus across the entire genome and never been seen before. It would be a work of genius the likes we have never seen using bioinformatics as yet unheard of, because what would happen is you would encounter endless deleterious mutations. Even if you got over all those, how do you know what fitness traits you are aiming for?

Serial passage This level of change is doable by serial passage of the virus, but this leads to attenuation of virus (less dangerous) not increased fitness. The yellow fever virus vaccine was created in this way. Even if a virus had been serially passaged we could tell, because we know the mutational patterns of other serially passaged viruses.

Correct answer by M__ on September 2, 2021

## Related Questions

### Tools for comparing/visualizing FASTAs?

4  Asked on February 18, 2021 by eb2127

### Optitype for Singularity

0  Asked on February 17, 2021 by sophistrs

### How to do bedtools intersection using pandas alone?

4  Asked on February 15, 2021

### What are phantom peaks in ChIP-seq?

1  Asked on February 13, 2021 by eric_kernfeld

### How to interpret column ‘N’ in tblastx output?

0  Asked on February 12, 2021

### How can I subset WGS data to the level of WES variants?

2  Asked on February 12, 2021 by jared_mamrot

### Finding the location and unit length of repetitive sequences within a long read

2  Asked on February 11, 2021 by gringer

### How to annotate gene length to a list of gene symbols using UCSC data?

1  Asked on February 11, 2021 by dn1

### Why could I be having only NAs for p_val and p_val_adj in Seurat DGE analysis?

0  Asked on February 10, 2021

### How to quantile normalization on RNA seq counts

3  Asked on February 8, 2021

### How to fix “expired or not present file ~/.gm_key!” error in braker?

0  Asked on February 6, 2021

### Remove/delete sequences by ID from multifasta

6  Asked on February 4, 2021 by andresito

### Why use “robust” FPKMs?

1  Asked on February 3, 2021

### Is there an R package that computes homoplasy excess ratios (HER)?

1  Asked on February 3, 2021

### How do I download the mitochondrial haplogroup datasets for human genetics online?

1  Asked on February 2, 2021

### Why do I obtain different output results with blast vs awk commands

0  Asked on February 2, 2021 by gsq

### Why is there a minus in the 2ˆ(–delta delta CT) method (qPCR)

0  Asked on February 2, 2021

### Datasets for making a ML-based model predicting if a PCR primer will match a mutated template

0  Asked on February 1, 2021 by jantek-mikulski

### How to annotate optimally a fungal genome without RNA-seq evidence?

1  Asked on January 30, 2021

### normalizing transcript-level expression data

0  Asked on January 30, 2021