A peer-reviewed article of this preprint also exists.
Abstract
The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) challenges include understanding what triggered SARS-CoV-2 emergence, how this RNA virus is evolving or how the genomic variability may impact the primary structure of proteins that are targets for vaccine. We analyzed 19471 SARS-CoV-2 genomes and 199,984 spike glycoprotein sequences available at the GISAID database from all over the world and 3335 genomes of other Coronoviridae family members available at Genbank, collecting SARS-CoV-2 high-quality genomes and distinct Coronoviridae family genomes. Here, we identify a SARS-CoV-2 emerging cluster containing 13 closely related genomes isolated from bat and pangolin that showed evidence of recombination, which may have contributed to the emergence of SARS-CoV-2. The analyzed SARS-CoV-2 genomes presented 9632 single nucleotide polymorphisms (SNPs) corresponding to a variant density of 0.3 over the genome, and a clear geographic distribution. SNPs are unevenly distributed throughout the genome and hotspots for mutations were found for the spike gene and ORF 1ab. We describe a set of predicted spike protein epitopes whose variability is negligible. All predicted epitopes for the structural E, M and N proteins are highly conserved. This result favors the continuous efficacy of the available vaccines.
Keywords:
Subject:
Biology and Life Sciences - Biochemistry and Molecular Biology
supplementary.zip (12.48MB )
Preprints on COVID-19 and SARS-CoV-2
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.