Abstract
Currently, entire world is crumbled due to COVID-19 caused by novel SARS-CoV-2. Globally, over 5 million people are infected by SARS-CoV-2 with 6% fatality rate. The surface spike (S) protein plays a key role in the pathogenesis of SARS-CoV-2 by mediating viral entry through human angiotensin converting enzyme 2 (hACE2) receptors on the host cell and there is a big global race to find viral neutralizing antibodies and vaccine against S protein of SARS-CoV-2. Since SARS-CoV-2 evolved into 10 different clades in a very short span, a study on sipke protein mutation is essential to have effective vaccine coverage globally. Based on the mutation analysis of S protein from 166 Indian SARS-CoV-2 genome, a total of 40 different SNPs comprising of 14 synonymous and 26 non-synonymous mutations were observed, and notably, Indian S protein diverged into two major clusters, D614 and G614, with 11 different types. Majority of Indian strains fall in A2a and O clusters. Alarmingly, we have observed six SNPs at RBD and notably two of them at RBM (S438F and S494P). S494P SNP, similar to Bat–SARS like-CoV, may indicate a low ACE2 binding affinity. Interestingly 38% of Indian strains harbor a characteristic D614G SNP which was found predominantly in A2a cluster, mostly comprising USA and European strains with high disease severity. The association of disease severity with D614G SNP is well-correlated in states with high death rate except Maharashtra. Notably, more than 50% of D614G mutation were observed in Northern part of India and 14% in Southern part but not in Kerala and Tamil Nadu strains. Highly conserved motif, D614 (608-VAVLYQDVNCT-618) in upstream and also few downstream, of S1/S2 furin cleavage site may indicate specific key role in efficient interaction with host proteases in pathogenesis. Further studies are warranted to clarify the impact of SD614G SNP association to disease severity . Interestingly, C2367T (Y789Y) synonymous SNP is observed in 37% of Indian strains and notably similar SNPs with degeneracy bases were observed which is a key indication for the possibility of misdiagnosis by Real-Time PCR and revised strategies are needed for the precise diagnosis. Circulation of high number of signature SNPs [D614G and C2367T (Y789Y)] in certain states may be an early indication of emergence of community transmission in India. Further large genome sequence data from India will aid in deep understanding on the diversity of circulating SASR-Cov-2 and its impact on disease severity, origin of imported cases to India, community spread, effect on diagnosis and vaccine coverage.