Abstract
RaTG13 beta coronavirus, which exists in the form of a genome sequence, is the closest relative of SARS-CoV-2 reported till date. The sample from which RaTG13 virus was sequenced was a bat fecal swab collected in 2013 from Tongguan, Mojiang, Yunnan province, China. The genome data for RaTG13, MN996532.1, was deposited on 27th Jan 2020 and the raw data (Illumina reads) was deposited a fortnight later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]. Comparison of the RNA Seq data of RaTG13 fecal swab sample to the corresponding data from the bat fecal swabs deposited by the same working group indicated that the raw data seemed to be anomalous in several aspects. Thirty percent of the reads did not match with anything. From the rest of the 70%, an abnormal high proportion was contributed by reads derived from eukaryotes (~68%). These matched with the sequences of not one but various bat species (round leaf bats, fruit bats and other bats) and animal species (squirrels, foxes, etc.) as per Krona analysis included with the SRA data. The proportion of the bacterial reads in the swab was exceptionally low, i.e. 0.7%, which is abnormal, compared to the 70-90% bacterial abundance in other bat fecal swabs. Furthermore, we also found another set of raw data associated with RaTG13, amplicon sequencing of the genome (SRX8357956), which was submitted in May 2020. Analysis of the amplicons by BLAST showed that these collectively do not cover the whole genome (MN996532.1). On closer inspection, the dates mentioned in the files of the sequenced amplicons were also found to be older (2017, 2018). Collectively, the anomalies in the raw data of RaTG13 pose an important question about the overall authenticity of the RaTG13 genome sequence.