1. Introduction
Coronaviruses employ complex replicative strategies involving long-range RNA-RNA interactions. These strategies, which include genome circularization, discontinuous transcription, and viral enhancers, approximate regulatory sequences in their large genomes influencing replication, pathogenicity, and immune evasion [
1,
2,
3,
4,
5,
6].
Genome circularization approximates regulatory sequences in the 5’- and 3’-untranslated regions (UTRs) via the interaction of complementary sequences in the UTRs facilitated by viral and host protein bridges during the synthesis of subgenomic negative-sense strands [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. Discontinuous transcription
generates a nested set of subgenomic RNAs via transcription regulatory sequence (TRS)-dependent template switching. The 5’-UTR TRS-leader interacts with homologous TRS-body elements upstream of structural and accessory genes in the last third of the genome
driven by the extent of TRSs’ base-pairing, viral and host protein-RNA binding, and high-order RNA-RNA interactions [
6,
22,
23,
24,
25]
.
Regarding viral enhancer elements, a long-range interaction spanning approximately 26 kilobases was identified in the
Alphacoronavirus transmissible gastroenteritis virus (TGEV) that was required for efficient transcription of its subgenomic mRNA encoding the
N gene, its most abundant subgenomic mRNA despite its less robust TRS-L-TRS-B interaction (
Figure 1) [
24,
25]. This nonanucleotide TGEV enhancer is conserved among other
Alphacoronaviruses, such as feline infectious peritonitis virus and canine enteric coronavirus. The TGEV enhancer via long-distance RNA-RNA interactions between complementary proximal and distal elements, brings TRS-L and TRS-B upstream of the
N gene closer for discontinuous transcription, resulting in higher
N gene expression levels [
24,
25]. The coronaviral N protein, in turn, has been postulated to act as a transactivator of gene expression similar to the lentiviral Tat protein via interactions with the 5’-UTR leader sequence and with viral and host proteins in the replication-transcription complex [
26].
A nonanucleotide enhancer element (UUUAUAAAC) also was characterized in MHV, a
Betacoronavirus (subgenus
Embecovirus) just distal to its TRS-L [
27] (
Figure 1B). However, the full mechanism underlying the enhancer activity has not been characterized.
The high prevalence of long-range RNA-RNA interactions in both coding and non-coding regions of coronaviruses and other RNA viruses, including insect nodaviruses [
28] and plant tombusviruses [
29,
30], the phylogenetic conservation of the involved RNA motifs, and the considerable level of global organization [
31] in replicative strategies in coronaviruses reinforces the relevance of these high-order structures for viral transcription of numerous distinct subgenomic mRNAs.
Long-distance RNA-RNA interactions contribute to RNA’s three-dimensional structure complexity and gene expression regulation, translation, and resistance to degradation, among other functions [
32].
Although genetic and biochemical analyses have confirmed the functional importance of many of these structures, their precise roles remain to be fully defined [
33,
34,
35]
. Understanding the noncoding functions of the viral RNA during viral replication provides new insights into virus-host interactions and therapeutic targets, which is the focus of this study predicting coronaviral enhancers and analyzing, in the case of IBV, their impact using published cases of viral attenuation or its reversal.
3. Discussion
In the present study, using the nonanucleotide-based enhanceosomes of Alphacoronaviruses and a Betacoronavirus and experience with other viral and host genome enhanceosomes, we detected potential enhancers, which vary in primary sequence and location, in the phylogenetically ancient Gammacoronavirus IBV and more recent Betacoronaviruses including SARS-CoV-2. The proposed enhancer model comprises an epistatic network with a core duplex-forming region molecular switch with identical sense and antisense sequences and complementary halves that could transition to an open configuration as dictated by a viral/host protein(s)/RNA(s), allowing robust pairing with proximal or distal remote viral genome complementary sequences to enhance specific viral gene expression.
Increased host immune evasion provides a replicative advantage to the virus. The coronaviral enhancer models proposed and reviewed here provide an alternative potential mechanism for viral attenuation, a process with multiple underlying mechanisms yet to be fully characterized. Analyses of sequenced IBV strains suggest disruptive effects of variation in the NSP16 gene duplex-forming region, remote complementary sequences in the Spike gene upstream of the immune evasion-related ORF3a, or both, consistent with some documented cases of viral attenuation as potential novel pathogenicity determinants. The MHV and SARS-CoV-2 enhancers also are predicted to affect viral gene expression in host immune evasion. Although the duplex-forming sequence in SARS-CoV-2 appears thus far conserved among variants, its variability may adversely affect virulence. Based on the IBV experience, it may be early to detect variation in SARS-CoV-2. The same applies to SARS-CoV-1, where scarce cases of mutations do not adversely affect the minimum free energy of the duplex.
The features of the potential coronavirus enhancers are similar to those of enhancers overall. Enhancers, as cis-acting key scaffolds for the transient dynamic recruitment and assembly of transcription factor/coactivator clusters, integrate regulatory information encoded by the surrounding genome and biophysical properties of trans-acting transcription pieces of machinery, such as RNA polymerase and transcriptional coregulators [
103,
104,
105,
106]. All the coronaviral duplex-forming structures described here have repeats of similar motifs in different orientations, possibly mediating binding to viral and host factors involved in the potential enhancer function consistent with the redundancy of transcription-factor-interacting sequences within enhancers.
Enhancers control gene expression location, level, and timing [
58,
107,
108]. In mammalian systems, enhancers determine spatiotemporal gene expression programs by engaging distant promoters over long genomic ranges [
109,
110,
111,
112,
113]. For instance, some enhancer-promoter RNA interaction sites involve pairwise interacting Alu and non-Alu RNA sequences that tend to be complementary and potentially form duplexes [
109].
Beyond the potential coronaviral enhancers analyzed here, we annotated duplex-forming regions reading similarly in the sense and antisense directions with complementary halves in MHV and bovine coronavirus, up to 83 nucleotides long (
Figure 20). The latter two viruses illustrate the presence of possible networks of different potential enhancer elements with more than one long-range RNA-RNA interaction. They
are reminiscent of the nested epistasis enhancer networks for robust genome regulation reported in mammalian genomes [
114]
.
In terms of the origin of the possible origin of the duplex-forming regions reading similarly in the sense and antisense directions with complementary halves (inverted repeats) and potential enhancer function, one could propose a template-switching mutation mechanism during RNA replication via RNA-dependent RNA polymerase. This mechanism would be similar to that postulated by Mönttinen et al. [
115], in which a template-switching mutation mechanism during DNA replication by DNA-dependent DNA polymerase could generate hairpin structures via inverted repeats or inverted and direct repeats that could evolve into novel microRNAs [
115]. The observation by Mönttinen et al. [
115] of template-switching mutation-driven evolution at the nucleic acid level occurring without affecting the protein sequence encoded is consistent, for instance, with the observation here that although the duplex in SARS-CoV-1
ORF3a lacks a homolog in SARS-CoV-2, the encoded peptide sequences are similar. The same applies to variation in the potential IBV enhancer element at the nucleotide level, with the amino acid sequences encoded by the duplex region being conserved among strains.
The observation that the sense and antisense strands have the same or almost the same sequence in the duplexes described also raises the possibility that the enhancer activity may extend to gene expression by the antisense strand. As is the case for the transcriptomes of cytomegaloviruses [
116], retroviruses [
117,
118,
119], and prokaryotes [
120,
121], that of SARS-CoV-2 might include RNAs that are transcribed from the negative-sense genomic RNA and encode functional proteins or nucleic acids involved in RNA regulation [
122]. For instance,
HTLV-1 antisense strand-encoded mRNA interacts with the promoter and enhances transcription of the C-C chemokine receptor type 4 (CCR4) gene to support the proliferation of HTLV-1-infected cells, and HIV-1 antisense mRNA is recruited to the viral long terminal repeat and inhibits sense mRNA expression to maintain the latency of HIV-1 infection [
123]
.
The potential enhancer elements and models discussed here require direct experimental validation. However, delineating possible long-range RNA-RNA interactions can enrich prediction tools and analysis of RNA folding and potential for novel structures, which are challenges for in silico prediction tools, none of which can offer full accuracy [
32,
124].
The experience with IBV strains provides fertile ground for these explorations illustrating the well-documented intricacies and limitations of ascribing viral attenuation to a particular mechanism given the diversity and frequency of mutations and recombination events among an ever-growing number of variants. However, the characterization of all possible attenuation and enhancement mechanisms of viral pathogenesis bodes well for the rational development of preventive and therapeutic strategies for all coronaviruses. Similar to using combination therapies against retroviruses [
122], the combination of attenuation mechanisms may allow the development of more effective coronaviral vaccines [
125,
126,
127].
Figure 1.
Nonanucleotide enhancer in transmissible gastroenteritis virus (TGEV, Alphacoronavirus, subgenus Tegacovirus, GenBank Accession: NC_038861). The TGEV enhancer upregulates the expression of the subgenomic RNA encoding the nucleocapsid (N) protein, possibly by approximating the TRS-L and TRS-B of N via duplex formation between the distal (close to the middle of the M gene) and proximal (7 nucleotides upstream of the N TRS-B) enhancer elements. Despite its sequence similarity to the distal element, the intermediate element shown does not contribute to enhancer activity consistently with its high minimum free energy.
Figure 1.
Nonanucleotide enhancer in transmissible gastroenteritis virus (TGEV, Alphacoronavirus, subgenus Tegacovirus, GenBank Accession: NC_038861). The TGEV enhancer upregulates the expression of the subgenomic RNA encoding the nucleocapsid (N) protein, possibly by approximating the TRS-L and TRS-B of N via duplex formation between the distal (close to the middle of the M gene) and proximal (7 nucleotides upstream of the N TRS-B) enhancer elements. Despite its sequence similarity to the distal element, the intermediate element shown does not contribute to enhancer activity consistently with its high minimum free energy.
Figure 2.
A. TGEV enhancer-based model for mechanism underlying MHV (Betacoronavirus, subgenus Embecovirus, GenBank Accession: NC_048217.1 GenBank Accession: NC_048217.1) enhancer function. Located immediately after the leader sequence, the MHV enhancer (green) could pair with a sequence towards the end of the genomic region encoding the ORF1b polyprotein. Because ORF1ab covers approximately two-thirds of the genome, the pairing would bring the TRS-L (red) closer to the first genomic TRS-B upstream of ORF2, potentially enhancing the transcription of its subgenomic RNA in a sequence-specific manner. B. Bovine CoV (Betacoronavirus, subgenus Embecovirus) has an MHV-like enhancer also immediately after the 5’-UTR leader sequence and similar sequences at three distal positions in the genome, which could pair with the 5’-UTR sequence. Minimum free energies (kcal/mol) are shown relative to the 5’-UTR sequence and the one located at position 4,434; the latter pairings are predicted to be more stable. Similarities with the leader, TRS-L, enhancer, and beyond are highlighted in purple, red, green, and blue, respectively C. Enhancer sequence shared between MHV and bovine CoV includes an octanucleotide reading the same in the sense and antisense strands (green arrows) and with complementary halves (blue and red arrows).
Figure 2.
A. TGEV enhancer-based model for mechanism underlying MHV (Betacoronavirus, subgenus Embecovirus, GenBank Accession: NC_048217.1 GenBank Accession: NC_048217.1) enhancer function. Located immediately after the leader sequence, the MHV enhancer (green) could pair with a sequence towards the end of the genomic region encoding the ORF1b polyprotein. Because ORF1ab covers approximately two-thirds of the genome, the pairing would bring the TRS-L (red) closer to the first genomic TRS-B upstream of ORF2, potentially enhancing the transcription of its subgenomic RNA in a sequence-specific manner. B. Bovine CoV (Betacoronavirus, subgenus Embecovirus) has an MHV-like enhancer also immediately after the 5’-UTR leader sequence and similar sequences at three distal positions in the genome, which could pair with the 5’-UTR sequence. Minimum free energies (kcal/mol) are shown relative to the 5’-UTR sequence and the one located at position 4,434; the latter pairings are predicted to be more stable. Similarities with the leader, TRS-L, enhancer, and beyond are highlighted in purple, red, green, and blue, respectively C. Enhancer sequence shared between MHV and bovine CoV includes an octanucleotide reading the same in the sense and antisense strands (green arrows) and with complementary halves (blue and red arrows).
Figure 3.
A. Similarity between the NSP16 duplex in avian infectious bronchitis virus (IBV; Gammacoronavirus, subgenus Igacovirus) and the MHV enhancer (green). Red and blue arrows indicate complementary halves that can form a duplex B. NSP16 duplex (with same sense and antisense sequences) and extended duplex. ΔG is the minimum free energy in kcal/mol. C. Similar distal sequences potentially pairing with the NSP16 duplex. One sequence is proximal (region encoding first protein in ORF1a) and the other distal (region encoding spike [S]) to the duplex.
Figure 3.
A. Similarity between the NSP16 duplex in avian infectious bronchitis virus (IBV; Gammacoronavirus, subgenus Igacovirus) and the MHV enhancer (green). Red and blue arrows indicate complementary halves that can form a duplex B. NSP16 duplex (with same sense and antisense sequences) and extended duplex. ΔG is the minimum free energy in kcal/mol. C. Similar distal sequences potentially pairing with the NSP16 duplex. One sequence is proximal (region encoding first protein in ORF1a) and the other distal (region encoding spike [S]) to the duplex.
Figure 4.
Mutations (highlighted in light blue) in avian IBV NSP16 extended duplex (complementary halves in blue and red), duplex minimum free energy (ΔG), number of GenBank strains with each mutation combination (mutations in light blue), translated amino sequence (conservative substitutions in green, nonconservative ones in magenta), and avian IBV origin other than chicken.
Figure 4.
Mutations (highlighted in light blue) in avian IBV NSP16 extended duplex (complementary halves in blue and red), duplex minimum free energy (ΔG), number of GenBank strains with each mutation combination (mutations in light blue), translated amino sequence (conservative substitutions in green, nonconservative ones in magenta), and avian IBV origin other than chicken.
Figure 5.
Examples of changes in the NSP16 duplex, the distal binding sequence, or both, which may underlie published cases of viral attenuation (Massachusetts and China strains) or its reversal (Taiwan strains). The NSP16 duplex is shown vertically in an open configuration with complementary halves in blue and red. Mutations are highlighted in light blue.
Figure 5.
Examples of changes in the NSP16 duplex, the distal binding sequence, or both, which may underlie published cases of viral attenuation (Massachusetts and China strains) or its reversal (Taiwan strains). The NSP16 duplex is shown vertically in an open configuration with complementary halves in blue and red. Mutations are highlighted in light blue.
Figure 6.
Analyses of possible association between extended NSP16 duplex variation and viral attenuation or its reversal.
Figure 6.
Analyses of possible association between extended NSP16 duplex variation and viral attenuation or its reversal.
Figure 7.
Relationship between extended NSP16 duplex minimum free energy and frequency of viral attenuated/vaccine-derived/vaccine revertant strains.
Figure 7.
Relationship between extended NSP16 duplex minimum free energy and frequency of viral attenuated/vaccine-derived/vaccine revertant strains.
Figure 8.
Similarities of the IBV NSP16 extended duplex with Rousettus bat beta-coronaviruses (Nobecoviruses), closely related to SARS-CoV-1 and -2, and can utilize the human ACE2 receptor in vitro. 1. Extended duplex in IBV (NC_001451). 2. OQ175246.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2016, isolate BtR1-BetaCoV/YN2016-Q319, toward end of ORF1ab); OQ175248.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2016, isolate BtR1-BetaCoV/YN2016-Q320, toward end of ORF1ab); OQ175341.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2017, isolate BtR1-BetaCoV/YN2017-Q321, toward end of ORF1ab starting at position 20,136). 3. MK492263.1 (Rousettus Bat CoV strain BtCoV92, Cynopterus brachyotis, Singapore, 2015). 4. OM219649.1 (Bat CoV GCCDC1, Eonycteris spelaea, Cambodia, 12/18,19/2010, isolate RK091); KU762332.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 356, China, 05/28/2014); NC_030886.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 356, China, 05/28/2014); KU762337.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 346, China, 05/28/2014); MT350598.1 (Rousettus bat CoV GCCDC1, Eonycteris spelaea, Singapore, 10/2016, beta-CoV, Nobecovirus); OQ175331.1 (Bat CoV EsYN16, Eonycteris spelaea, China/Yunnan, 2016, BtEs-13BetaCoV/YN2016-Q311); OQ175332.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q312); OQ175333.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q313); OQ175242.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q309). Duplex complementary halves are highlighted in blue and red. Differences in bat sequences relative to IBV extended duplex are highlighted in light blue. Nucleotides shared among sequences are highlighted in grey. Minimum free energy (ΔG) is shown for each duplex, as is the degree of similarity of duplexes relative to IBV reference duplex expressed as expect (e).
Figure 8.
Similarities of the IBV NSP16 extended duplex with Rousettus bat beta-coronaviruses (Nobecoviruses), closely related to SARS-CoV-1 and -2, and can utilize the human ACE2 receptor in vitro. 1. Extended duplex in IBV (NC_001451). 2. OQ175246.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2016, isolate BtR1-BetaCoV/YN2016-Q319, toward end of ORF1ab); OQ175248.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2016, isolate BtR1-BetaCoV/YN2016-Q320, toward end of ORF1ab); OQ175341.1 (Bat CoV RlYN17 [Rousettus leschenaultia], China/Yunnan, 2017, isolate BtR1-BetaCoV/YN2017-Q321, toward end of ORF1ab starting at position 20,136). 3. MK492263.1 (Rousettus Bat CoV strain BtCoV92, Cynopterus brachyotis, Singapore, 2015). 4. OM219649.1 (Bat CoV GCCDC1, Eonycteris spelaea, Cambodia, 12/18,19/2010, isolate RK091); KU762332.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 356, China, 05/28/2014); NC_030886.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 356, China, 05/28/2014); KU762337.1 (Rousettus leschneaulti Bat CoV isolate GCCDC1 346, China, 05/28/2014); MT350598.1 (Rousettus bat CoV GCCDC1, Eonycteris spelaea, Singapore, 10/2016, beta-CoV, Nobecovirus); OQ175331.1 (Bat CoV EsYN16, Eonycteris spelaea, China/Yunnan, 2016, BtEs-13BetaCoV/YN2016-Q311); OQ175332.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q312); OQ175333.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q313); OQ175242.1 (Bat CoV EsYN17, Eonycteris spelaea, China/Yunnan, 2017, BtEs-13BetaCoV/YN2017-Q309). Duplex complementary halves are highlighted in blue and red. Differences in bat sequences relative to IBV extended duplex are highlighted in light blue. Nucleotides shared among sequences are highlighted in grey. Minimum free energy (ΔG) is shown for each duplex, as is the degree of similarity of duplexes relative to IBV reference duplex expressed as expect (e).
Figure 9.
Duplex-forming sequences reading the same in sense and sense directions in the NSP3 gene of SARS-CoV-2 (A) and SARS-CoV-1 (B), and in NSP16 of IBV. The complementary halves are highlighted in blue and red. Extended duplex regions are also shown. Regions of similarity within the SARS-CoV-2 extended duplex are highlighted in light blue with arrows indicating that they are in inverted orientations. Similar sequences within and among SARS-CoV-2 and SARS-CoV-1 and in IBV are highlighted in pink. Minimum free energies are shown for all duplexes.
Figure 9.
Duplex-forming sequences reading the same in sense and sense directions in the NSP3 gene of SARS-CoV-2 (A) and SARS-CoV-1 (B), and in NSP16 of IBV. The complementary halves are highlighted in blue and red. Extended duplex regions are also shown. Regions of similarity within the SARS-CoV-2 extended duplex are highlighted in light blue with arrows indicating that they are in inverted orientations. Similar sequences within and among SARS-CoV-2 and SARS-CoV-1 and in IBV are highlighted in pink. Minimum free energies are shown for all duplexes.
Figure 10.
A. Region in SARS-CoV-2 5’-UTR with MHV-like enhancer and similarity to NSP3 duplex, with which it can pair, leaving the duplex free to pair with three other distal genomic regions. The pairings involving the S gene (the first gene after ORF1ab; e = 2.4 for sequence similarity to the NSP3 duplex) would approximate the TRS-L to the TRS-B of the gene encoding the viroporin ORF3a. The pairings involving NSP4 and NSP10 (e = 0.6 for both sequence comparisons to the NSP3 duplex) would also decrease the distance between the TRS sequences. B. Sequence of the SARS-CoV-2 leader (blue box) that precedes the MHV-like enhancer (yellow letters with a dark green background) and that is added via discontinuous transcription to all accessory and structural genes. C. Similarities among SARS-CoV-2 5’-UTR sequence after leader, NSP3 duplex, and TGEV proximal enhancer element.
Figure 10.
A. Region in SARS-CoV-2 5’-UTR with MHV-like enhancer and similarity to NSP3 duplex, with which it can pair, leaving the duplex free to pair with three other distal genomic regions. The pairings involving the S gene (the first gene after ORF1ab; e = 2.4 for sequence similarity to the NSP3 duplex) would approximate the TRS-L to the TRS-B of the gene encoding the viroporin ORF3a. The pairings involving NSP4 and NSP10 (e = 0.6 for both sequence comparisons to the NSP3 duplex) would also decrease the distance between the TRS sequences. B. Sequence of the SARS-CoV-2 leader (blue box) that precedes the MHV-like enhancer (yellow letters with a dark green background) and that is added via discontinuous transcription to all accessory and structural genes. C. Similarities among SARS-CoV-2 5’-UTR sequence after leader, NSP3 duplex, and TGEV proximal enhancer element.
Figure 11.
A.
NSP3 duplex and complementary sequence in the SARS-CoV-2 genome. The pair can form a duplex with a minimum free energy (ΔG) similar to the
NSP3 duplex. B. Switch model for the opening of the
NSP3 duplex and interaction with a 17-nucleotide complementary sequence in
ORF3a. Initially, the duplex could be stabilized by RNA-RNA, RNA-protein, and protein-protein interactions involving undetermined factors, here termed X. Upon removal of said factors by epigenetic modification or interaction with other proteins or regulatory RNAs, the duplex could open and interact with other genomic sequences. In the illustration, the duplex and complementary sequences are 20,252 nucleotides apart, and their interaction is more stable in terms of ΔG, and length of sequences involved than those between TRS-L (6 nucleotides long) and TRS-Bs during discontinuous transcription of accessory and structural genes. A protein (named here Y or a combination of factors) could stabilize the new duplex between distant complementary sequences. C. Positions (depicted with stars) in the SARS-CoV-2 genome of the
NSP3 duplex (purple) and complementary sequences in Panel A and
Figure 10. Numbers next to positions correspond from lowest to highest ΔG; e ranged from 0.6 to 2.4.
Figure 11.
A.
NSP3 duplex and complementary sequence in the SARS-CoV-2 genome. The pair can form a duplex with a minimum free energy (ΔG) similar to the
NSP3 duplex. B. Switch model for the opening of the
NSP3 duplex and interaction with a 17-nucleotide complementary sequence in
ORF3a. Initially, the duplex could be stabilized by RNA-RNA, RNA-protein, and protein-protein interactions involving undetermined factors, here termed X. Upon removal of said factors by epigenetic modification or interaction with other proteins or regulatory RNAs, the duplex could open and interact with other genomic sequences. In the illustration, the duplex and complementary sequences are 20,252 nucleotides apart, and their interaction is more stable in terms of ΔG, and length of sequences involved than those between TRS-L (6 nucleotides long) and TRS-Bs during discontinuous transcription of accessory and structural genes. A protein (named here Y or a combination of factors) could stabilize the new duplex between distant complementary sequences. C. Positions (depicted with stars) in the SARS-CoV-2 genome of the
NSP3 duplex (purple) and complementary sequences in Panel A and
Figure 10. Numbers next to positions correspond from lowest to highest ΔG; e ranged from 0.6 to 2.4.
Figure 12.
The SARS-CoV-2 NSP3 duplex-forming 36-nucleotide sequence is present only in closely related bat Sarbecoviruses and SARS-CoV-1 among all Viridae. Nucleotide changes among all isolates are highlighted in light blue and those unique to SARS-CoV-1 are in light green. Sequences in SARS-CoV-2-related bat coronaviruses could be divided into four groups: With an identical 36-nucleotide segment (ΔG = -22.5): BetaCoV_Yunnan_Rp_JCC9_2020 (OK287355.1); BANAL-20-236/Laos/2020 (MZ937003.2); BANAL-20-247/Laos/2020 (MZ937004.1); BANAL-20-116/Laos/2020 (MZ937002.1); BANAL-20-103/Laos/2020 (MZ937001.1); BANAL-20-52/Laos/2020 (MZ937000.1); RpYN06 strain bat/Yunnan/RpYN06/2020 (MZ081381.1); isolate PrC31 (MW703458.1). With identical nucleotides 1-33 except nucleotide 3 (ΔG = -17.3): isolates RsHB20 BtRs-BetaCoV/HB2020-Q329 (OQ175349.1), Jingmen Rhinolophus sinicus betacoronavirus 1 (MZ328294.1), SC2018B (OK017846.1), and BM48-31/BGR/2008 (NC_014470.1). With identical nucleotides 4-33 (ΔG = -17.1): Horseshoe bat Sarbecovirus isolates Rt22QT53 (OR233321.1), Rt22QT48 (OR233320.1), Rt22QT46 (OR233319.1), Rt22QT36 (OR233318.1), Rt22QT178 (OR233317.1), Rt22QT161 (OR233316.1), Rt22QT124 (OR233300.1), Rt22QB8 (OR233299.1), and Rt22QB78 (OR233298.1); and isolates BtSY1 (OP963575.1), HN2021F (OK017835.1), and HN2021E (OK017834.1). With identical nucleotides 1-36 except nucleotides 3 and 7 (ΔG = -15.1): isolates GD2019E (OK017828.1), GD2019D (OK017827.1), GD2019B (OK017826.1), GD2019A (OK017825.1), GD2017W (OK017824.1), GD2017P (OK017822.1). SARS-CoV-1 sequence in Tor2 (NC_004718) and Urbani (MT308984) strains.
Figure 12.
The SARS-CoV-2 NSP3 duplex-forming 36-nucleotide sequence is present only in closely related bat Sarbecoviruses and SARS-CoV-1 among all Viridae. Nucleotide changes among all isolates are highlighted in light blue and those unique to SARS-CoV-1 are in light green. Sequences in SARS-CoV-2-related bat coronaviruses could be divided into four groups: With an identical 36-nucleotide segment (ΔG = -22.5): BetaCoV_Yunnan_Rp_JCC9_2020 (OK287355.1); BANAL-20-236/Laos/2020 (MZ937003.2); BANAL-20-247/Laos/2020 (MZ937004.1); BANAL-20-116/Laos/2020 (MZ937002.1); BANAL-20-103/Laos/2020 (MZ937001.1); BANAL-20-52/Laos/2020 (MZ937000.1); RpYN06 strain bat/Yunnan/RpYN06/2020 (MZ081381.1); isolate PrC31 (MW703458.1). With identical nucleotides 1-33 except nucleotide 3 (ΔG = -17.3): isolates RsHB20 BtRs-BetaCoV/HB2020-Q329 (OQ175349.1), Jingmen Rhinolophus sinicus betacoronavirus 1 (MZ328294.1), SC2018B (OK017846.1), and BM48-31/BGR/2008 (NC_014470.1). With identical nucleotides 4-33 (ΔG = -17.1): Horseshoe bat Sarbecovirus isolates Rt22QT53 (OR233321.1), Rt22QT48 (OR233320.1), Rt22QT46 (OR233319.1), Rt22QT36 (OR233318.1), Rt22QT178 (OR233317.1), Rt22QT161 (OR233316.1), Rt22QT124 (OR233300.1), Rt22QB8 (OR233299.1), and Rt22QB78 (OR233298.1); and isolates BtSY1 (OP963575.1), HN2021F (OK017835.1), and HN2021E (OK017834.1). With identical nucleotides 1-36 except nucleotides 3 and 7 (ΔG = -15.1): isolates GD2019E (OK017828.1), GD2019D (OK017827.1), GD2019B (OK017826.1), GD2019A (OK017825.1), GD2017W (OK017824.1), GD2017P (OK017822.1). SARS-CoV-1 sequence in Tor2 (NC_004718) and Urbani (MT308984) strains.
Figure 13.
A complementary segment (11 nucleotides, e=0.3) to the SARS-CoV-1 Tor2 (NC_004718.3) and Urbani (MT308984) strains NSP3 duplex are similar to that in SARS-CoV-2. However, the minimum free energy is higher for the pairing between the switch duplex and the complementary sequence, rendering the pairing less stable.
Figure 13.
A complementary segment (11 nucleotides, e=0.3) to the SARS-CoV-1 Tor2 (NC_004718.3) and Urbani (MT308984) strains NSP3 duplex are similar to that in SARS-CoV-2. However, the minimum free energy is higher for the pairing between the switch duplex and the complementary sequence, rendering the pairing less stable.
Figure 14.
SARS-CoV-1 ORF3a duplex and complementary sequence in SARS-CoV-1 genome with a similar minimum free energy. In this case, the duplex switch structure is distal to the complementary sequence with which it can pair with a similar ΔG. However, the effect of reducing the distance between TRS-L and the TRS-B of the gene distal to the viroporin ORF3a, namely the viroporin E, is achieved.
Figure 14.
SARS-CoV-1 ORF3a duplex and complementary sequence in SARS-CoV-1 genome with a similar minimum free energy. In this case, the duplex switch structure is distal to the complementary sequence with which it can pair with a similar ΔG. However, the effect of reducing the distance between TRS-L and the TRS-B of the gene distal to the viroporin ORF3a, namely the viroporin E, is achieved.
Figure 15.
Comparison between the SARS-CoV-2 NSP3 duplex and equivalent region in SARS-CoV-1, and between the SARS-CoV-1 ORF3a duplex and equivalent region in SARS-CoV-2. The SARS-CoV-2 NSP3 duplex is relatively well conserved in SARS-CoV-2, while the SARS-CoV-1 ORF3a duplex is not conserved in SARS-CoV-2 (nucleotides differing between SARS-CoV-2 and -1 are shown in red). However, the encoded amino acid sequences are relatively well conserved (highlighted in yellow, as are similar nucleotides).
Figure 15.
Comparison between the SARS-CoV-2 NSP3 duplex and equivalent region in SARS-CoV-1, and between the SARS-CoV-1 ORF3a duplex and equivalent region in SARS-CoV-2. The SARS-CoV-2 NSP3 duplex is relatively well conserved in SARS-CoV-2, while the SARS-CoV-1 ORF3a duplex is not conserved in SARS-CoV-2 (nucleotides differing between SARS-CoV-2 and -1 are shown in red). However, the encoded amino acid sequences are relatively well conserved (highlighted in yellow, as are similar nucleotides).
Figure 16.
Similarity between the enhancer elements of TGEV and MERS-CoV sequences. The intermediate sequence is the one with the highest similarity, yet in TEGV, it does not contribute to the enhancer activity. The TEGV enhancer distal and proximal sequences are partly present. The minimum free energy of the distal-proximal element pairing is higher than that in TEGV, rendering this potential enhancer unlikely to be active in MERS-CoV.
Figure 16.
Similarity between the enhancer elements of TGEV and MERS-CoV sequences. The intermediate sequence is the one with the highest similarity, yet in TEGV, it does not contribute to the enhancer activity. The TEGV enhancer distal and proximal sequences are partly present. The minimum free energy of the distal-proximal element pairing is higher than that in TEGV, rendering this potential enhancer unlikely to be active in MERS-CoV.
Figure 17.
Comparison between NSP3 duplexes and extended duplexes of the Betacoronaviruses (subgenus Sarbecovirus) SARS-CoV-2 (A) and SARS-CoV-1 (B) with the NSP16 duplex and extended duplex in the Betacorinavirus (subgenus Merbecovirus) MERS-CoV. Regions of similarity are highlighted in light blue and pink, and arms of duplex reading similarly in the sense and antisense directions are highlighted in dark blue and red.
Figure 17.
Comparison between NSP3 duplexes and extended duplexes of the Betacoronaviruses (subgenus Sarbecovirus) SARS-CoV-2 (A) and SARS-CoV-1 (B) with the NSP16 duplex and extended duplex in the Betacorinavirus (subgenus Merbecovirus) MERS-CoV. Regions of similarity are highlighted in light blue and pink, and arms of duplex reading similarly in the sense and antisense directions are highlighted in dark blue and red.
Figure 18.
NSP3 extended duplex in hCoV-OC43 (Betacoronavirus, subgenus Embecovirus). Regions of similarity with the NSP3 duplexes in SARS-CoV-2 and -1 and MERS-CoV are highlighted in light blue and pink, and arms of duplex reading the same in the sense and antisense directions are highlighted in dark blue and red. Sequences in green letters highlighted in red correspond to repeated sequences in the duplexes described in this paper and shown in
Figure 19.
Figure 18.
NSP3 extended duplex in hCoV-OC43 (Betacoronavirus, subgenus Embecovirus). Regions of similarity with the NSP3 duplexes in SARS-CoV-2 and -1 and MERS-CoV are highlighted in light blue and pink, and arms of duplex reading the same in the sense and antisense directions are highlighted in dark blue and red. Sequences in green letters highlighted in red correspond to repeated sequences in the duplexes described in this paper and shown in
Figure 19.
Figure 19.
Extended duplexes of the Betacoronaviruses infecting humans. A. Multiple alignments of the duplexes with similarity percentages. The highest duplex similarity was between SARS-CoV-2 and -1, while the MERS-CoV extended duplex sequence was closest to that of IBV (61% similarity), both in NSP16 but also to the NSP3 duplex of SARS-CoV-2 and -1 (67% and 58%). Asterisks denote nucleotides conserved among all duplexes, highlighted in either blue or green. B. Repeated sequences within duplexes and similarity among them. Repeated similar sequences within and among duplexes are highlighted in red, fuchsia (reverse of those in red), light blue (shared between SARS-CoV-2 and MERS-CoV), and green (shared between MERS-CoV and IBV), and green letters (shared between MERS-CoV and IBV). .
Figure 19.
Extended duplexes of the Betacoronaviruses infecting humans. A. Multiple alignments of the duplexes with similarity percentages. The highest duplex similarity was between SARS-CoV-2 and -1, while the MERS-CoV extended duplex sequence was closest to that of IBV (61% similarity), both in NSP16 but also to the NSP3 duplex of SARS-CoV-2 and -1 (67% and 58%). Asterisks denote nucleotides conserved among all duplexes, highlighted in either blue or green. B. Repeated sequences within duplexes and similarity among them. Repeated similar sequences within and among duplexes are highlighted in red, fuchsia (reverse of those in red), light blue (shared between SARS-CoV-2 and MERS-CoV), and green (shared between MERS-CoV and IBV), and green letters (shared between MERS-CoV and IBV). .
Figure 20.
Annotated coronaviral duplex-forming sequences that read similarly in the sense and antisense directions with complementary halves. Minimum free energies (ΔG) are shown next to each hairpin. GenBank accession numbers are shown in parentheses. Positions that do not read the same in the sense and antisense directions are underlined. The MERS-CoV NSP16 duplex is present in bat Merbecoviruses and pangolin CoV HKU4 (GenBank OM009282.1). Most of the MHV M duplex is in the bovine coronavirus M gene (OP866729.1), and the one in the bovine CoV NSP13 helicase is in the canine respiratory coronavirus (ON133844.1), also an Embecovirus. The sequences annotated here do not comprise an exhaustive list, and other coronaviruses that share some of these duplexes were mentioned in the text.
Figure 20.
Annotated coronaviral duplex-forming sequences that read similarly in the sense and antisense directions with complementary halves. Minimum free energies (ΔG) are shown next to each hairpin. GenBank accession numbers are shown in parentheses. Positions that do not read the same in the sense and antisense directions are underlined. The MERS-CoV NSP16 duplex is present in bat Merbecoviruses and pangolin CoV HKU4 (GenBank OM009282.1). Most of the MHV M duplex is in the bovine coronavirus M gene (OP866729.1), and the one in the bovine CoV NSP13 helicase is in the canine respiratory coronavirus (ON133844.1), also an Embecovirus. The sequences annotated here do not comprise an exhaustive list, and other coronaviruses that share some of these duplexes were mentioned in the text.