Note that six of these sequences fall under the terms of use of the GISAID platform. Katoh, K., Asimenos, G. & Toh, H. in Bioinformatics for DNA Sequence Analysis (ed. 92, 433440 (2020). & Boni, M. F. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. For weather, science, and COVID-19 . 1 Phylogenetic relationships in the C-terminal domain (CTD). The web application was developed by the Centre for Genomic Pathogen Surveillance. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. Natl Acad. 32, 268274 (2014). Evol. This is not surprising for diverse viral populations with relatively deep evolutionary histories. Evol. Split diversity in constrained conservation prioritization using integer linear programming. All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. Proc. Virus Evol. Stegeman, A. et al. Lancet 395, 565574 (2020). Identifying the origins of an emerging pathogen can be critical during the early stages of an outbreak, because it may allow for containment measures to be precisely targeted at a stage when the number of daily new infections is still low. The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. Pink, green and orange bars show BFRs, with regionA (nt 13,29119,628) showing two trimmed segments yielding regionA (nt13,29114,932, 15,40517,162, 18,00919,628). Coronavirus: Pangolins found to carry related strains. Below, we report divergence time estimates based on the HCoV-OC43-centred rate prior for NRR1, NRR2 and NRA3 and summarize corresponding estimates for the MERS-CoV-centred rate priors in Extended Data Fig. In March, when covid cases began spiking around India, Bani Jolly went hunting for answers in the virus's genetic code. Evol. Coronavirus Disease 2019 (COVID-19) Situation Report 51 (World Health Organization, 2020). A phylogenetic treeusing RAxML v8.2.8 (ref. In Extended Data Fig. Pangolin relies on a novel algorithm called pangoLEARN. The research leading to these results received funding (to A.R. performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. Nature 583, 286289 (2020). A distinct name is needed for the new coronavirus. Without better sampling, however, it is impossible to estimate whether or how many of these additional lineages exist. Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. He, B. et al. Conducting analogous analyses of codon usage bias as Ji et al. Google Scholar. SARS-CoV-2 is an appropriate name for the new coronavirus. Virological.org http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339 (2020). Google Scholar. BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. B., Weaver, S. & Sergei, L. Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. https://doi.org/10.1093/molbev/msaa163 (2020). These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. Bioinformatics 28, 32483256 (2012). Zhang, Y.-Z. Wang, L. et al. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. These are in general agreement with estimates using NRR2 and NRA3, which result in divergence times of 1982 (19482009) and 1948 (18791999), respectively, for SARS-CoV-2, and estimates of 1952 (19061989) and 1970 (19321996), respectively, for the divergence time of SARS-CoV from its closest known bat relative. We say that this approach is conservative because sequences and subregions generating recombination signals have been removed, and BFRs were concatenated only when no PI signals could be detected between them. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. Extended Data Fig. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. Evol. Mol. Forni, D., Cagliani, R., Clerici, M. & Sironi, M. Molecular evolution of human coronavirus genomes. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. The origins we present in Fig. 26 March 2020. A.R. 4 TMRCAs for SARS-CoV and SARS-CoV-2. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. While there is involvement of other mammalian speciesspecifically pangolins for SARS-CoV-2as a plausible conduit for transmission to humans, there is no evidence that pangolins are facilitating adaptation to humans. 04:20. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. 4. The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. 3). It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. 21, 255265 (2004). Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. 5. Zhou, H. et al. M.F.B., P.L. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. 36, 17931803 (2019). 874850). And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Posada, D., Crandall, K. A. RegionC showed no PI signals within it. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. Nature 503, 535538 (2013). Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. Maclean, O. and P.L.) matics program called Pangolin was developed. We extracted a total of 2189 full-length SARS-CoV-2 viral genomes from various states of India from the EpiCov repository of the GISAID initiative on 12 June 2020. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. CAS The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. Because 3SEQ identified ten BFRs >500nt, we used GARDs (v.2.5.0) inference on 10, 11 and 12 breakpoints. Thank you for visiting nature.com. Eden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. & White, P. A. Recombination within the pandemic norovirus GII.4 lineage. The authors declare no competing interests. We thank all authors who have kindly deposited and shared genome data on GISAID. Uncertainty measures are shown in Extended Data Fig. Membrebe, J. V., Suchard, M. A., Rambaut, A., Baele, G. & Lemey, P. Bayesian inference of evolutionary histories under time-dependent substitution rates. Our most conservative approach attempted to ensure that putative NRRs had no mosaic or phylogenetic incongruence signals. To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . Abstract. =0.00025. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. In the meantime, to ensure continued support, we are displaying the site without styles Mol. This leaves the insertion of polybasic. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. 382, 11991207 (2020). We named the length-sorted BFRs as: BFRA (ntpositions 13,29119,628, length=6,338nt), BFRB (ntpositions 3,6259,150, length=5,526nt), BFRC (ntpositions 9,26111,795, length=2,535nt), BFRD (ntpositions 27,70228,843, length=1,142nt) and six further regions (EJ). Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. Su, S. et al. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). T.T.-Y.L. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. 4). Viruses 11, 979 (2019). Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Rev. Alternatively, combining 3SEQ-inferred breakpoints, GARD-inferred breakpoints and the necessity of PI signals for inferring recombination, we can use the 9.9-kb region spanning nucleotides 11,88521,753 (NRR2) as a putative non-recombining region; this approach is breakpoint-conservative because it is conservative in identifying breakpoints but not conservative in identifying non-recombining regions. EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. & Andersen, K. G. The evolution of Ebola virus: insights from the 20132016 epidemic. J. Virol. The plots are based on maximum likelihood tree reconstructions with a root position that maximises the residual mean squared for the regression of root-to-tip divergence and sampling time. In this study, we report the case of a child with severe combined immu presenting a prolonged severe acute respiratory syndrome coronavirus 2 infection. a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position. Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. "This is an extremely interesting . Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. D.L.R. 95% credible interval bars are shown for all internal node ages. Trova, S. et al. A., Filip, I., AlQuraishi, M. & Rabadan, R. Recombination and lineage-specific mutations led to the emergence of SARS-CoV-2. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. The genetic distances between SARS-CoV-2 and Pangolin Guangdong 2019 are consistent across all regions except the N-terminal domain, implying that a recombination event between these two sequences in this region is unlikely. Wang, H., Pipes, L. & Nielsen, R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). Download a free copy. Menachery, V. D. et al. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. T.L. 36, 7597 (2002). Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. Google Scholar. Biol. SARS-CoV-2 and RaTG13 are the most closely related (their most recent common ancestor nodes denoted by green circles), except in the 222-nt variable-loop region of the C-terminal domain (bar graphs at bottom). 13, e1006698 (2017). Biazzo et al.