2. The pathogen
According to the International Committee on Taxonomy of Viruses [
6], the responsible pathogen causing COVID-19 was recognized in a beta coronavirus belonging to the same family of viruses that can cause influenza-like syndromes and colds, which was eventually named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). It is an enveloped virus with a single-stranded positive sense RNA genome composed of 13 to 15 open reading frames (ORFs), 12 of which are functional. Also, it contains four major structural proteins: spike (S, which includes the receptor-binding domain (RBD)), envelope (E), membrane (M), and nucleocapsid (N), along with additional genes encoding accessory proteins, including the RNA-dependent RNA polymerase (RdRp) (
Figure 1) [
7].
The total length of the SARS-CoV-2 genome is approximately 30,000 bases, making it one of the longest among the different virus types and having important consequences for proofreading during transcription, particularly with respect to the risk of introducing errors [
8].
The key enzyme of all RNA viruses, both negative and positive-sense, is represented by the RdRp, which is essential for replicating viral RNA and transcription. In positive-sense single strand RNA viruses, such as SARS-CoV-2, the enzyme directly transcribes the positive-sense RNA, which acts exactly like a messenger RNA, but also double convert positive-sense RNA in negative-sense RNA and then again in positive-sense RNA, to be assembled in the final viral particle [
9].
The spike protein of SARS-CoV-2 is composed of 1273 amino acids and contains two major subunits called S1 (between amino acids 14 and 685) and S2 (between amino acids 686 and 1273) [
10]. The first subunit binds to the host cell receptors, while the second subunit contains the machinery responsible for fusion with the host membrane. Subunit S1 has three main domains: the N-terminal domain (between amino acids 14 and 305), the receptor binding domain (between amino acids 319 and 541), and a carboxy-terminal domain (between amino acids 542 and 681). Subunit S2 instead consists of the hydrophobic fusion peptide (between amino acids 788 and 806), the two heptapeptides repeat 1 (between amino acids 912 and 984) and repeat 2 (between amino acids 1163 and 1213), as well as the so-called transmembrane domain (between amino acids 1213 and 1237) and the cytoplasmic domain (between amino acids 1237 and 1273) (
Figure 2).
In the native form, the spike protein forms a trimer on the viral surface, with the three protomers, each consisting of an S1 and S2 domain, determining the typical extracellular stalk and bulbous “crown.” After cleavage of the S1 subunit, the S2 subunit is prepared for fusion with the host cell membrane by its fusion peptide.
Overall, SARS-CoV-2 has a mean size of 91 nm and has a mean number of 24 spike proteins on its surface, approximately 10 nm apart [
11]. Regarding the possible origin of this coronavirus, phylogenetic analysis with structural modeling of the spike protein of SARS-CoV-2 revealed that this protein shares a cumulative 97% sequence identity with that of another bat coronavirus and shares 97% sequence identity with a pangolin coronavirus in the receptor binding domain [
12]. This may suggest that SARS-CoV-2 likely arose from the recombination of these two animal coronaviruses in an intermediate host and then infected humans as a result of a spill-over. SARS-CoV-2 also has 76% structural homology with SARS-CoV-1, the virus that caused the SARS outbreak nearly 20 years ago. However, the structural homology between these two viruses in the receptor binding domain is relatively low, less than 50%, which may contribute to explaining the same biological and clinical features between SARS and COVID-19.
Besides unsubstantiated hypotheses or other conspiracy theories that SARS-CoV-2 escaped or was created or replicated in a research laboratory, the animal spillover theory is the most widely accepted and is also consistent with what has happened in the past with other coronaviruses [
13]. One possible theory is that SARS-CoV-2 likely originated from a bat coronavirus, perhaps BatCoV-RaTG13, that was transmitted to another intermediate animal, perhaps a pangolin, where the precursor virus underwent initial recombination within the animal. This new virus may then have been transmitted to the first human index case (where it became similar to the ancestral strain of SARS-CoV-2), which then caused an initial local outbreak. Since the initial spread of the virus in a likely limited human niche could not be detected in time, it is conceivable that further intra-human recombination would have occurred, eventually producing the highly virulent pathogen that caused the pandemic. This theory is accredited by phylogenetic studies, which revealed that the ancestral BatCoV-RaTG13 coronavirus, which probably appeared in 2013, already possessed a latent capacity to bind to human cellular targets. However, this potency has been completely unmasked only after the acquisition of successive mutations in the intermediate host and the human niche.
4. SARS-CoV-2 evolution
There is little doubt that SARS-COV-2 has undergone many mutations since its earlier appearance in Wuhan and will continue to do so while it remains among us. This is not surprising since viruses that encode their genomes directly in RNA, including HIV and influenza, rapidly insert mutations into their RNA because these microorganisms replicate within their hosts where enzymes that copy RNA are prone to error [
35]. Before specifically addressing the important issue of SARS-CoV-2 variants, it is important to recall some basic concepts of natural selection [
36]. First, variants of interest are defined as those that exhibit particular characteristics that warrant further evaluation. Variants of concern are defined as those with more specific and worrisome characteristics, such as increased transmissibility, association with more severe disease, or potential to affect diagnosis, therapy, or vaccine efficacy.
Regarding the origin of mutations, they are subject to two main pressures. Adaptive evolution is the process by which some advantageous non-synonymous substitutions become dominant in the viral sequence by positive selection. Viral strains carrying these mutations may have selection advantages, replacing earlier variants and becoming dominant over time [
37]. Conversely, convergent evolution is the process by which favorable nonsynonymous substitutions in the viral sequence arise independently but simultaneously in different organisms, even in distant locations, due to positive selection because they provide a substantial advantage [
37]. Therefore, these mutations will characterize independent and even phylogenetically distant viral lineages. A paradigmatic example is the E484K mutation within the spike protein, which occurs in lineages B.1.351, P.1, and B.1.1.7. These two aspects are not infrequently found in nature.
Thus, SARS-CoV-2 has mutated and will continue to do so for many important reasons, most notably to increase its affinity for host cell receptors or to evade the immune response, both of which may act synergistically to increase the likelihood of host cell infection. Most mutations are also triggered by natural or vaccine-induced immunity, as well as by antiviral drugs that contribute to the enrichment of the viral genome with mutations. This pressure leads to the emergence and disappearance of new SARS-CoV-2 variants over time [
38]. At this point, it is worth recalling that in acute infection, viral load increases rapidly until the host immune response begins to clear the virus, whereupon viral load suddenly decreases. During this period, which is thus a combination of increasing and decreasing viral load, new mutations can evolve in the viral genome that can be incorporated and transmitted to other individuals [
39]. Although recent data support the concept that the mutation rate of SARS-CoV-2 is many times (up to 20) lower than that of influenza virus [
40], phylogenetic analyses have allowed the estimation that the SARS-CoV-2 genome undergoes nearly 8 mutations per month, mainly caused by random episodic increases in the substitution rate [
41].
Numerous studies have helped to clarify the important intra-host evolution of SARS-CoV-2. In their analysis, Shen and colleagues found that the median number of intra-host variants of SARS-CoV-2 in infected patients ranged from 1 to 4, with a very wide range spanning between 0 to 51 in different patients [
42]. In a subsequent study involving a large number of COVID-19 patients with persistent viral shedding [
43], the authors found that nearly 93% of the samples collected from infected individuals contained at least one intra-host single nucleotide variant. Each sample contained a mean of approximately 20 intra-host single nucleotide variants, and the risk of developing intra-host single nucleotide variants was higher in older adults, male subjects, and individuals with prolonged viral shedding. In a subsequent investigation, the authors found the presence of intra-host single nucleotide variants in 68% of COVID-19 patients, with a median frequency of 1 and a range between 0 and 45. Up to 24% of these intra-host single nucleotide variants were synonymous, whereas 76% were instead non-synonymous. The S gene was that with the highest frequency of non-synonymous intra-host single nucleotide variants, followed by ORF1a, ORF1b, and N [
44]. An even higher rate of intra-host single nucleotide variation was found in a large multicenter study conducted by Pathak et al. [
45], with such variations occurring in up to 82% of all samples tested. Importantly, this analysis confirmed that the Delta and Kappa lineage-defining variations first developed as intra-host single nucleotide variations before becoming fixed in the population, thus confirming that this is the most likely pathway supporting the emergence of novel SARS-CoV-2 variants. Other interesting aspects have emerged from the study published by Laskar and colleagues [
46]. Despite the large number of intra-host variants detected, consistent with previous data, the authors also found that the number of intra-host variations was higher in deceased patients (i.e., almost twice compared to asymptomatic patients). Not unexpectedly, the highest number of mutations involved the spike protein, followed by the nucleocapsid and NSP12 protein.
An important aspect of the emergence of new variants is the fact that the probability of occurrence of non-synonymous mutations and new variants seems to be influenced by the duration of active infection, thus suggesting that higher variability could be observed with longer viral persistence and replication in the human body [
43]. Nevertheless, one myth that has since been dispelled is that vaccination can increase the likelihood of new variants evolving within the host. According to data published by Al-Khatib et al. [
47], the number of quasispecies detected in vaccinated patients with breakthrough infections was almost identical-if not lower–than in unvaccinated patients with homologous infection with the same SARS-CoV-2 strain. Similarly, analysis of the number of intra-host variants in patients with infections by different SARS-CoV-2 strains did not reveal significant differences between unvaccinated and vaccinated individuals.
Another mechanism worth highlighting that leads to the emergence of new SARS-CoV-2 variants is intra-host recombination between two different SARS-CoV-2 variants [
48]. The recombination events between the two lineages can thus generate a third viral lineage that may contain selective mutations of each original strain. This mechanism has likely become very common recently, as many individuals have become infected with multiple variants. It is likely the basis for the recent emergence of the so-called “recombinant strains” [
49], which now include the XBB Omicron sublineages (probably derived by recombination of BA.2 with BA.2.75.2).
BA.2.86 is the most recent SARS-CoV-2 variant that WHO is intensively monitoring because it has a substantial number of mutations in the spike protein that may contribute to its immune escape potential. Four of these mutations (K417N, S477N, N501Y, and P681R, which exist only at BA.2.86) have raised serious concerns [
50]. Interestingly, P681R is located at the furin cleavage site and is thought to enhance cleavage of the S1 and S2 subunits of the spike protein, leading to better penetration into host cells. This mutation was already present in the Delta variant and likely contributed to the wide distribution and the high pathogenetic potential of this former SARS-CoV-2 strain [
50].
What is concerning now, especially with the emergence of XBB sublineages and BA.2.86, is that the level of antigenic difference compared to the original Wuhan strain is equivalent to the difference between SARS-CoV-1 and SARS-CoV-2, which has persuaded some authors to suggest the paradox to re-classify these novel variants as “SARS-CoV-3” [
51,
52].
However, from a clinical perspective, a comprehensive assessment of the clinical impact of the most prevalent SARS-CoV-2 variants over time has shown that the burden of hospitalizations and ICU admissions has decreased dramatically over time. This change can be attributed to a kaleidoscope of factors, such as more efficient and timely diagnosis of SARS-CoV-2 infection, better availability of human and technical resources in the healthcare system, improved prevention and therapeutic treatment, the spread of vaccine-triggered or natural immunity, but certainly also to an attenuation of the aggressiveness and pathogenicity of SARS-CoV-2 [
53]. Very similar evidence has emerged from an England-based nationwide study, where it was found that the worst variant in terms of hospitalizations and deaths was the Alpha, followed by Delta, ancestral, and Omicron [
54].
Post-mortem studies clearly demonstrate that the clinical burden of COVID-19 has varied widely throughout the pandemic. In a recent study, Schwab et al. reported that the rate of patients with SARS-CoV-2 infection who died directly from COVID-19 was higher during the predominance of the Alpha variant, intermediate during the waves dominated by the Delta and ancestral strains, and became the lowest (i.e., around 10%) after the emergence of the Omicron sublineages [
55]. Accordingly, cumulative lung injury was the highest in patients who died from the Alpha and Delta variants compared with those who died from infection with the Omicron sublineages.
6. Intracellular processing
There are a number of multifaceted mechanisms for the intracellular processing of SARS-CoV-2, described in detail in the article by Martin-Sancho et al. [
73] and will therefore only be briefly summarized here. These events can be divided into a series of sequential steps, starting with virus-cell interaction, followed by membrane fusion and release of viral RNA into the host cell, which is translated and replicated by the RdRp, as well as viral particle assembly, budding, and extracellular release. A single viral particle originating in the host cell is estimated to contain approximately 35-40 viral RNA–protein complexes consisting of approximately 12 nucleocapsid copies and 800 nucleotides of genomic RNA [
74].
The lack of a timely and appropriate antiviral response is another critical mechanism related to the intracellular processing of SARS-CoV-2. In particular, the protein ORF6 and possibly other viral proteins such as NSP1, NSP3, NSP12, NSP13, NSP14, ORF3, and M appear to play a role in limiting early interferon production and associated downstream signaling [
75]. Because the type I interferon system is an essential component of the innate immune response, neutralizing this pathway may be responsible for some of the virulence of SARS-CoV-2. This concept has been confirmed by other studies, such as that of Xia et al. [
76], which showed that all three of the more lethal coronaviruses, SARS-CoV-1, SARS-CoV-2, and MERS-CoV, are capable of inhibiting the interferon I pathway with varying degrees of efficacy. Although some viral proteins effectively inhibit interferon I signaling, the NSP1 and NSP6 proteins of SARS-CoV-2 were characterized by higher efficiency than the homologous proteins of the other two coronaviruses.