# Ancestry of the coronavirus 2019-nCov, WuHan city, China

Bioinformatics Asked by puppetsock on August 29, 2021

In one of the answers to another question about the corona virus a link was given to this phylogenetic analysis of the virus.

Can somebody assist a non-bio type here? It seems to show that the current corona virus split from a virus in bats. And that the same ancestral virus is also ancestral to several other viruses, some in humans and some in bats.

Is my understanding basically correct?

Overview The central focus of the tree is to highlight the key biological concern of the new coronavirus, 2019-nCov. The key concern is the genetic similarities to SARS epidemic, and relates to the SARS receptor.

SARS background SARS is endemic in bats (your BioRxiv tree partly shows that and this tree definitely shows it) and in the 2002 epidemic infected civet cats which then infected humans. More importantly it is then transmitted from one human to another. This is a consequence of the SARS receptor been able to exploit the cellular receptors in the respiratory tract of bats, civet cats and humans and use this tissue as a site of replication. The concern is exacerbated because the tree here shows SARS independently infected humans on two separate occasions, suggesting the cross-over has an underlying genetic basis, which is part of the "SARS lineage".

BioRxiv 2019-nCov tree Your tree shows the 2019-nCov has a recent common ancestor with SARS in comparison to the rest of the betacoronaviruses. This therefore provides some circumstantial evidence the receptor mechanism and ability to frequently crossover from bats and ultimately transmit human to human could be shared with SARS. At an evolutionary level (which is what ANY tree is supposed to reconstruct) it raises the question which common ancestor of SARS did the "SARS receptor" originate. I'd need to draw a diagram to better demonstrate this point, but I hope you get the idea.

The "influenza receptor" The analogy is with influenza virus and the entire epidemiology of influenza hinges around is sialic acid receptor of influenza and its ability to bind on to the cellular receptors in the upper/lower respiratory tract of birds, pigs and humans. The hypothesised mechanism is called the mixing vessel theory and is the classic epidemiological understanding of how new pathogenic influenza pandemics occur. If you replace "birds and pigs" with e.g. "bats, civet cats and humans" you get the idea why this could be scary. We don't the intemediate host of 2019-nCov ... but I speculate there must be one, unless eating Cov-infected bats is common in China.

Technical details of the BioRxiv tree The tree is a nicely diverse selection of the beta-coronaviruses. The authors have rooted the tree using the outgroups of delacoronaviruses and gammacoronaviruses, so it is a good robust selection of outgroups which can be used to correctly identify the direct of evolution of betacoronavirus divergence. In tree theory (its formal name is phylogenetics theory), extensive rooting is good and minimises artefacts.

One of the earliest members of the betacoronaviruses to diverge is MERS (Middle Eastern respiratory syndrome), which form a single "clade" (all viruses share a unique common ancestor) and these represent around 50% amino acid divergence from SARS/2019-nCov. The selection of MERS in the tree from both camels and humans looks good. The selection of other betacoronaviruses looks great, I wasn't aware of the "ruminant clade" at all involving buffalo, cow etc .. infections and there was an associated human infection. There are loads of bat isolates of the betacoronaviruses throughout the tree, but we eventually arrive at the "SARS clade". The authors show that 2019-nCov is an outgroup to the SARS clade and shows a close relationship to one (BioRxiv tree) or two (this tree) bat isolates. Looking at the precise SARS clade (better shown in this tree) we can see loads of bat virus associating with SARS lineages. We therefore assume the reservoir to SARS and likely 2019-nCov is bat, moreover that the single ancestor to both viruses was a bat (it is called a parsimonious hypothesis).

The one thing the BioRXiv tree omits, for example with regards this tree, is the diversity of SARS and in particularly the two independent origins of SARS, which is a weakness of their analysis, particular if this fed into downstream analysis. It is not to say the authors were wrong, but it was uncool.

Word of caution my understanding is the divergence between SARS and 2010-nCov is around 15% and this is a quite a large amount of genetic divergence, even if both viruses share a most recent common-ancestor, form part of the same clade and receptor peptide motifs. Nevertheless there is sufficient amino acid divergence to generate notable differences in epidemiology, clinical symptom and transmission.

MERS, SARS and 2019-nCov mortality rates It is worth noting that within the betacoronaviruses the mortality rates between difference clinically important viruses is very different. MERS has a mortality rate of 40-60%, SARS is around 10% but 2019-nCov is <2.5%. The 2019-nCov mortality rate is still important given it has infected more and could potentially infect alot more people than SARS.

Correct answer by M__ on August 29, 2021

## Related Questions

### mBED has time complexity $O(n lg n)$, claimed by Clustal Omega paper, why?

1  Asked on December 7, 2021

### Sequence alignment using BWT

1  Asked on December 5, 2021

### differential analysis of chip-seq data

3  Asked on December 2, 2021

### Generating 3D coordinates error

1  Asked on December 2, 2021

### Seurat DE t.test

1  Asked on December 2, 2021

### Coronavirus RNA structures?

1  Asked on November 29, 2021

### calculating nucleotide frequency per column

7  Asked on November 27, 2021

### Is it possible for coronavirus or SARS to be synthetic?

1  Asked on November 27, 2021

### Extract sequences from partial Header

2  Asked on November 24, 2021

### Subsetting from seurat object based on orig.ident?

2  Asked on November 24, 2021

### Is there a way to measure cell line similarity using python?

1  Asked on November 17, 2021

### P-value correction when evaluating correlation between gene and miRNA expression

1  Asked on November 15, 2021

### Separating peaks of chip-seq with specific length

2  Asked on November 15, 2021

### How to run MaxQuant in command line mode?

2  Asked on November 12, 2021

### Why does the FASTA sequence for coronavirus look like DNA, not RNA?

4  Asked on November 12, 2021

### How to extract metadata from NCBI’s experiment?

1  Asked on November 12, 2021

### Viral Metagenomics

1  Asked on November 10, 2021

### How to map short sequences to long reads, recovering all multiply-mapped high-quality matches

1  Asked on November 10, 2021

### How identifiable are human omics data and how to mitigate their identifying features?

2  Asked on November 4, 2021

### How to get transcriptome FASTA file for viruses for Kallisto pseudo-alignment?

2  Asked on October 6, 2021