Introduction
The idea that the repetitive genome encodes genetic information by shape rather than by sequence is relatively new. The unit of information is the flipon, a genomic element that can adopt alternative structures under physiological conditions. The conformation formed depends on the repeat sequence involved. The classic example is provided by left-handed Z-DNAs and Z-RNAs (collectively called ZNAs) that are formed by runs of alternating guanosine and cytosine [
1,
2]. Collectively, the repetitive genome comprises over 50% of the human sequence, compared to 2.5% for protein coding genes.
Flipons in the B-DNA conformation have little informational value as the repeats are frequent in the genome. They also lack the complexity of codons, so do not contribute directly to the Watson and Crick genetics that focuses on protein variation. Instead, flipons alter the readout of genetic information by localizing structure-specific complexes to genomic loci able to power the flip from a right-handed B-DNA or A-RNA helix to an alternative DNA or RNA fold. The readout of RNAs then varies dynamically with flipon structure. Here the focus is on G-flipons that form G-quadruplexes (GQ) in DNA (dGQ), RNA (rGQ) or DNA/RNA hybrids (hGQ). GQ are inherently more stable than ZNA helices. Consequently, G-flipons can actuate biological processes that are quite distinct from those modulated by Z-flipons.
GQ forming sequences are defined by the canonical DNA motif (G
3N
1-7)
3G
3, where G is guanine and N is any nucleotide. Four G-bases hydrogen bond to each other to form a tetrad that then folds into a four stranded structure (
Figure 1A). In place of the Watson-Crick base-pairing scheme, the rather unconventional Hoogsteen hydrogen bonds stabilize the interaction (
Figure 1B, highlighted by colored shading). The G-tetrad was first observed in X-ray diffraction studies of 5'-GMP and 3'-GMP gels, each stacking the tetrads on top of another in a different manner [
3]. The preferred helical arrangement of GQ crystalline fibers was later revealed by structural studies of polyinosinic and polyguanylic RNAs [
4].
It was once widely believed that GQ did not exist in cells. If present, then the GQ formed predisposed to genetic instability and to disease [
5]. There was much excitement when the Tetrahymena telomere sequence repeats [
6] were shown to form GQ [
7]. In contrast, later work revealed that telomeres in vivo were more likely to form a different type of structure called a T-loop [
8]. Closure of the loop lead to formation of a three-stranded DNA structure that incorporated the single stranded telomeric end and a subtelomeric segment. This structure was protected by a shelterin protein complex. The T-loop model seemingly ruled out a role for GQ in telomere maintenance (but see below). The prevailing view that GQ were bad was reinforced by the many loss-of-function (LOF) helicase variants that were associated with human mendelian diseases. The failure of these variants to resolve GQ was considered causal for the genomic instability, even though the helicases also resolve other non-B structures, such as cruciforms and the Holliday junctions (HJs) that form during recombination [
9]. Further, a role for GQ in pathology was suggested by an analysis of repeat expansion diseases. In some cases, the sequences involved were predicted to freeze in the GQ conformation, thereby interfering with a variety of cellular functions, including DNA replication, transcription and RNA processing [
10].
However, there was evidence that GQ played an essential biological role in the adaptive immune system. The GQ were associated with class switch recombination of immunoglobulin heavy chain (IgH) genes. Of interest were the noncoding switch (S) regions in the IgH gene that underwent transcription to produce R-loops. The non-template strand was G-rich and 2 to 10 kb in length. When displaced by RNA transcript, the single-stranded G-rich DNA was able to fold back on itself to form GQ [
11]. The targeting of the AID cytosine deaminase protein to the GQ structure by the helicase DDX1 was essential for both class switching and immunoglobulin somatic hypermutation that is critical for antibody affinity maturation [
11,
12,
13,
14]. The cytosine to uridine substitution catalyzed by the cytidine deaminase was not only mutagenic, but also recruited the repair machinery required for DNA recombination. In other contexts, GQ formation in G-rich DNA due to R-loop formation was proposed as pathogenic [
15].
Other experimental approaches to unraveling the biology of GQ were complicated by the equilibrium that exists between different flipon conformations, with the transition occurring in unmodified DNA and without requiring any strand cleavage [
1]. Early experiments using dimethyl sulfoxide footprinting of RNA failed to show the protection of guanine bases expected if a GQ had formed inside a cell [
16]. These results were interpreted to show that GQ were not biologically relevant. However, there was a problem with the experimental design: chemical modification of any G-quadruplexes that unfolded during the time course of the experiment would prevent the structure from reforming [
17]. In other words, the longer the experiment ran, the less chance there was of detecting the presence of GQ in a cell. Nevertheless, the study highlighted the possibility that GQ were formed dynamically in cells and that they were rapidly resolved to reform B-DNA.
There were also limitations to other experimental approaches designed to detect GQ. Tools designed to detect GQ in cells were able to induce their formation. This risk of artefact increased when assays were performed on cell extracts. Here various factors came into play, such as the buffers used, and the loss of proteins that might otherwise restrain the B-DNA flip to GQ. Even well accepted ChIP-seq protocols to map protein interactions potentially mislead, as recently shown by a stringent analysis of the GQ binding substrates of PRC2 (Polycomb Repressor Complex 2) interactions [
18]. Combined, these uncertainties limited the widespread acceptance of G-flipons as important components of the genetic repertoire. The repetitive genome was just considered “junk” [
19].
The intent of this review is to integrate information from a wide range of research papers, including some whose significance has been long overlooked and are not mentioned in many recent GQ reviews [
20,
21,
22,
23,
24,
25,
26,
27,
28]. The initial focus is on the genetic evidence that speaks to an early evolutionary role for G-flipons in maintaining genomic stability and on the proteins that localize the machinery required for nucleotide and base excision repair (NER, BER respectively) by inducing GQ formation. Different classes of helicase then power the resolution of GQ to reform B-DNA, completing the flipon cycle. By changing the readout of genetic information, flipons dynamically reprogram a cell in response to environmental perturbations.
I will then discuss roles for G-flipons in transcription that emerged later in evolution. This feature reflects a change in how GQ recognition occurs, from interactions involving single-stranded loops and modified bases, to those mediated by proteins that bind both B-DNA and G-quadruplexes through a different face of the same helix.
Biophysical and Computational Studies of the G-Quadruplex
The basic building block for a G-quadruplex is a guanine tetrad formed by Hoogsteen hydrogen bonding, [
4] (
Figure 1B) between bases [
29]. Interestingly, the parallel nature of these bonds contributes to sigma bonds that increases the stability of the G-tetrads relative to those formed by xanthine where the bonding is anti-parallel [
30]. A recent review describes 48 different possible GQ folds, reflecting whether the four strands are parallel, anti-parallel or a mix, made from one to 4 different strands with lateral, diagonal or propeller loop topology [
31] (
Figure 1). Further, the guanosine residues may be either in the
syn or
anti conformation (with the guanine base either lying over the sugar or pointed away from it) [
32]. The GQ can also be left-handed [
33]. The folds are stabilized by a central metal, with a potassium ion preferred over the smaller sodium and lithium ions for parallel strand GQ. The metal preference for other GQ folds varies and depends on whether they are made from RNA or DNA [
34]. Non-consecutive guanosines can form tetrads with the extra residue everted from the stack to form a bulge. In the case of GGA repeats, the adenine bases that are excluded from the quadruplex can interact with the tetrads to produce a heptad structure [
35,
36].
The stability of GQ is also affected by the loop composition, decreasing with loop length, and varying with the loop nucleotide sequence [
37]. With long runs of G repeats, defined as over 500 bases in length, the loops can basepair to give even higher order structures. Of the 299 such long G runs reported, over 67% are located within 6M bp (base pairs) of telomeres. [
38]. Interestingly, loop length and sequence variation has increased during evolution, especially in mammals, as has GQ length, number, and density in the genome [
39]. G-flipons are also more frequent on the non-template strand of coding genes [
40,
41].
Besides GQ formation by neighboring G3 repeats, it has been proposed that GQ are formed by a pair of G3 repeats in an enhancer and a pair of G3 repeats from a promoter [
42]. Further, a hybrid GQ can form between a pair of DNA G3 repeats in the non-template strand and a matching RNA G3 pair in the nascent transcript [
43]. The GQ formed by strands that are not physically connected to each other also show structural variation. The GQ can assemble by stacking tetrads, one on top of the other or by pairing bases from the separate strands l to form a G-wire [
44,
45]. G-wires were originally proposed to explain the alignment of homologous chromosomes during meiosis [
46]. Tetrads missing the fourth base can incorporate into the vacant space a guanine provided
in trans, potentially acting as a sensor for a local change in concentration of the replacement nucleotide [
47].
RNA tetrads only form parallel rGQ when G-repeats are contiguous [
34]. A variety of different rGQ folds are stabilized by pairing schemes involving G bases that are separated by other nucleotides. [
48]. rGQ composed of only 2 tetrads have been reported [
49] and are stabilized by the 2′-hydroxyl group present in RNA [
34]. In contrast, there are many possible variations of dGQ composed of 3 or more tetrads, making it difficult to computationally predict from sequence alone those flipons that actually form dGQ in vivo. A database that combines results from a variety of experimental methods now overcomes this problem by providing a set of well validated G-flipons detected in many different studies using a variety of approaches [
50]. The mappings show that in the human genome, dGQ forming sequences are enriched in transcription start sites (TSS), in introns and at transcription termination sites (TTS) [
39].
GQ Binding Proteins
The plethora of different dGQ topologies allows for different modes of protein recognition (
Figure 2 and
Figure 3). Strategies to confirm these interactions and the specificity of binding to GQ include those that synthesize control oligonucleotides containing the 8-aza-7-deazaguanosine base (
Figure 1C) that will not form the Hoogsteen hydrogen bonds necessary to stabilize a GQ (
Figure 1B, crimson shading), despite having the same chemical composition as guanine [
51]. In these studies, different modes of docking to GQ have been identified, including binding to loop sequences or to 5' and 3' single-strand extensions that give the helicases something to pull on so that they can unwind the structure. Proteins can bind to loops formed when adducted bases such as 8-oxo-G prevent the incorporation of a DNA strand into a GQ, or to the everted bases across from an apurinic/apyrimidinic (AP) site. Proteins also dock to the planar tetrad surfaces that form the GQ endplate (
Figure 2). Specific binding to RNA rather than DNA GQ is favored by intrinsically disordered regions (IDR) enriched in arginine, glycine repeats, as recently reviewed [
52], and visualized in the FMR crystal structure of the Fragile X Mental Retardation Protein bound to an RNA GQ [
53]. In principle, the preformed GQ site for docking IDR lowers the entropic cost of binding.
The stability of GQ and strength of their interaction with proteins can vary with the loop length and the loop sequence composition [
54,
55], as revealed by studies of nucleolin and the 2E4 Darpin [
56,
57]. Further, the latching of a single base by the REV1 polymerase [
58], and the docking to an AP site by APE1 (AP endonuclease 1) [
59], can create a surface that induces GQ folding. As we will discuss, the use of SANT (Swi3, Ada2, N-Cor, and TFIIIB) domains to recognize parallel-strand GQ is of particular interest as the domains can use the same helix to bind B-DNA in a sequence-specific manner (
Figure 2). In total, 50 GQ-peptide structures are present in the Protein Data Base (PDB) showing a variety of interactions [
26,
57]. A subset of validated GQ interacting proteins is given in
Figure 3. Listings of additional proposed GQ binding proteins can be found in recent publications ([51,He, 2023 #3109, 60] and online at the G4IPBD database (
http://people.iiti.ac.in/~amitk/bsbe/ipdb/index.php, accessed 15th September, 2024) [
61] and the QUADRatlas database (
https://rg4db.cibio.unitn.it/, accessed) 15th September, 2024 [
62].
The Accumulating Evidence for the Biological Importance of G-Quadruplexes
Despite the numerous challenges to studying the cellular functions of high energy and dynamic flipon conformations, much progress has been made. There are two key aspects to the biology: first the events that promote and resolve the formation of the alternative flipon structures and second, the transactions that the alternative flipon conformations modulate. There are well validated proteins that can induce the flip to GQ and many helicases capable of their resolution (
Figure 3, up and down arrows). Although GQ formation does not inherently require any change, modification, or cleavage of DNA or RNA, such events may change the propensity of G-flipons to flip from one conformation to another. The GQ formed in these processes differ in topology. The structured loops they form are recognized by specific sets of proteins, as are the GQ endplates (
Figure 3, top). The outcomes depend on which cellular machinery is localized to a particular GQ. The complexes formed enable cells to reprogram their responses to environmental perturbations.
The trans actions occurring between GQ formed at different sites are also important in understanding their cellular functions. The complexes nucleated by one GQ have the potential to associate with other G4-anchored structures to form membraneless condensates (
Figure 4) [
64,
65]. These complexes can be quite large and visible by light microscopy [
64]. The interactions enable the sequencing and timing of events within the cell (
Figure 4A). Pairing of promoters GQ with GQ formed at enhancers, splice sites and polyadenylation sites then generate production lines for the processing of transcripts. Anchoring of the lines to the nuclear scaffold to form factories [
66,
67] that enable the transcriptional bursts associated with gene expression [
68]. The pliability of these production lines is revealed by the constant updates to nuclear architecture [
69].
The GQ Architecture of Retroviruses
The simplest example of GQ mediated integration may be provided by retroviruses, such as the human immunodeficiency virus 1 (HIV-1). These viruses encode G-flipons in the long terminal repeat that is present at either end of their 9.6 kb genomic insert [
70] (
Figure 4B). The arrangement enables the formation of chromatin loops that separates the viral protein coding genome from that of the host. In this state, the virus is likely latent. Nevertheless, the virus is poised to replicate on removal of the loop restraint (
Figure 4B). The HIV-1 plus strand mRNA also contains 11 potential G-quadruplexes with 9 in the coding sequence. The topologies are mixed, raising the possibility that particular pairings affect the splicing, stability, recombination, and repair of transcripts [
71]. Long Interspersed elements (LINEs) are another class of retrotransposons of similar length to retroviruses that have a G-flipon conserved in their 3'UTR (untranslated region). The pairing pf the LINE GQ with GQ in cellular enhancers has the potential to form a loop that controls their expression in a tissue-specific manner [
72]. Conversely, the 5’UTR G-flipons that LINE families acquire during evolution can themselves act as tissue specific enhancers of cellular genes [
73].
G-flipon functions within the cell and their modulation by G-flipon cycles are described below,
Cell Division
Interestingly, the first evidence hinting at a biological role for GQ came from the round worm
Caenorhabditis elegans. Sequences with the G-quadruplex motifs underwent deletion in strains with
dog-1 (deletions of guanine-rich DNA) LOF variants, but not those sequences with only 3 G
3 repeats that are unable to form GQ [
74]. Mutant strains of
dog-1 lacking the trans-lesion polymerases (TLS) polymerases, POL eta and POL kappa had significantly more G-tract deletions than dog-1 by itself [
75]. Interestingly, the combined deletion of
dog-1 and the spindle-checkpoint component
mdf-1 enabled long term survival [
76], even though a high incidence of lethal mutations in this strain was revealed by the use of balancer chromosomes. In total, 126 (13%) of the 954 mono-G/C tracts larger than 14 bp, were deleted over 470 generations when both genes were absent. A role for GQ in sister chromatid alignment by the cohesin proteins during mitosis was suggested by effects of
dog-1 LOF on the spindle checkpoint. The absence of other phenotypes also supported the consensus that GQ had only a limited role in normal cell biology, not only in
C. elegans, but also in other organisms.
Epigenetic Maintenance. The
dog-1 homolog in the DT40 chicken lymphoblastoid cell line, the 5' FANCJ (Fanconi Anemia Complementation Group J) helicase (a member of the Fe-S superfamily 2 (SF2)) [
77] also was found to prevent deletion of guanine repeats (G-repeats) with the potential to form GQ. Effects of the mutation were enhanced by loss of the REV1 polymerase that localizes TLS to sites of polymerase stalling. Interestingly, REV1 catalytic activity was not necessary to prevent deletion, although the LOF variant did enhance the rate of G-repeat loss. Also, in the FANCJ model, the combined deletion of the Werner and Bloom Syndrome 3' helicases (RecQ SF2) [
78] also increased G-repeat deletion, likely because of GQ accumulation [
77].
Of interest was that the TLS pathway was required to maintain the epigenetic state of dividing cells, as monitored by cell-surface expression of a protein with an intronic G-flipon that regulated gene expression. Whereas in the wildtype cell, the histone modifications associated with this G-flipon were maintained, they were lost following
rev1 deletion. Instead, resolution of the GQ formed during DNA replication was through the gap-fill repair pathway. The subsequent incorporation of unmodified histones led to diminished gene transcription and surface marker expression. This
rev1-dependent phenotype could be reverted by re-expression of human FANCJ helicase [
77]. The opposite effect was observed when a G-flipon was experimentally inserted into a repressed locus. In this case,
rev1 deletion led to depression of the segment, consistent with the replacement of repressive histone with unmodified histones that were permissive to gene expression [
79]. These results support a model where the formation of GQ by G-flipons during periods of cell proliferation helps in transmitting the current epigenetic state to progeny, an important biological outcome.
DNA replication and Sister Chromatid Conformation. The involvement of GQ in cell proliferation is further supported by other evidence. During assembly of the DNA polymerase complex at the origin of replication (OOR), the MTBP protein assists in the loading of CDC45 into the replicative helicase. The C-terminal domain of MTBP binds GQ in vitro [
80]. Notably, G-flipons are enriched in OOR. Indeed, in chicken DT20 cells a minimal, functional OOR consists of a 90 bp fragment that has two G-flipons on the same strand [
81]. These constructs establish the nucleosome depleted region (NDR) bounded by histone H2A.Z that is typical of the OOR. Collectively, the results suggest a model in which the MTBP binds GQ at the OOR to initiate the assembly of the replication complex.
Another potential role for GQ during proliferation and transmission of epigenetic state is to align sister chromatids, as mapping of intra- and inter chromatin interactions between homologous chromosomes reveals a high degree of symmetry in the architecture of topologically associated domains (TADs), and in the loops formed within TADs [
82]. In this regard, a recent report suggests that G-flipons are enriched near sites bound by the CTCF (CCCTC-binding factor), a protein associated with loop formation. Interestingly, the strand orientation of the G-flipons mirrors the inverse orientation of the two CTCF sites that associate with each other to form the base of the loop [
83]. CTCF however is not known to bind GQ [
51].
DNA Repair
G-flipons in nucleotide excision repair (NER) . The REV1 pathway also plays a role in NER that is triggered by UV irradiation and the formation of DNA crosslinks. In this situation, loading of the repair pathway proteins such as XPCC and RAD23 is triggered by the protein ZRF1 and its yeast homolog Zuo1 that recognizes the lesion and induces GQ formation [
84]. Triggering of this pathway by cytosine deaminases can result in single base substitutions at a sequence tagged site (STS) with a C to G transversion resulting from the preferential insertion of cytidine into the lesion by REV1 [
85]. This mutation (STS13) is prevalent in cancers [
86].
NER in the transcription coupled repair pathway (TCR) depends on the Cockayne Syndrome B (CSB) helicase (encoded by ERCC6) that binds GQ [
87]. On sensing a lesion, CSB displaces DSIF (DRB Sensitivity Inducing Factor) from the RNA polymerase 2 (RNAP2) complex, inducing a conformational switch that halts transcriptional elongation and initiates TCR [
88]. LOF variants of CSB are associated with premature aging phenotypes[
87].
G-flipons in base excision repair (BER) . APE1 plays a similar role in stabilizing GQ formed by AP DNA, but not unmodified DNA, to initiate BER pathway [
59]. The pathway removes oxidized bases, such as 8-oxo-G. It is proposed that regulation of the APE1 by acetylation coordinates the expression of genes involved in cellular pathways that respond to oxidative damage. Interestingly, the GQ involved are formed from G-flipons with a “spare tire” (
Figure 1F). The extra runs of G-repeats allow formation of a GQ despite damage to one of the other repeats [
89].
The 8-oxoG modification can arise due to toxins in the environment. The adduct is also generated during the flavin-dependent LSD1 (lysine demethylase 1A, encoded by KDM1A) demethylation of H3K9me2, where hydrogen peroxide is a product of the reaction. The LSD1 enzyme is activated during the induction of BCL2 gene expression by estrogen [
59]. The repair of the lesion through the BER pathways depends on GQ formation. Before the involvement of GQ in this process was known, the finding, it was proposed that DNA strand breaks were a general mechanism for initiating gene transcription [
90].
Hemin and Oxidative damage. Another cause of oxidative damage is due to the production of highly reactive oxidative species catalyzed by hemin, an iron-containing porphyrin that is present at high concentration in the cell [
91]. Hemin binds with high affinity (K
d ~ 10 nM) to GQ, an interaction that was initially highlighted for its ability to increase production of superoxide [
92]. However, it appears that in cells that this reaction is squelched, presumably by proteins that bind to GQ [
91]. In such cases, GQ may act as a sink for free hemin and trigger the rapid repair through the BER pathway of any damage hemin causes. In such cases, GQ protect rather than damage the genome.
GQ and Telomeres
The formation of T-loops by telomeres described above does not rule out a role for GQ formation in telomere protection. Indeed, the TRF2/RAP1 complex protects telomeres from homologous recombination by repressing PARP1 localization to telomeres and by inhibiting the SLX4 resolvase that binds to HJs. Loss of TRF2 and RAP1 in both humans and mice leads to rapid telomere attrition, with increased rates of telomere deletion and fusion[
93]. TRF2 preferentially docks to rGQ rather than dGQ. The protein binds rGQ formed by the noncoding Telomeric Repeat-Containing RNA (TERRA) telomere transcript through an RG rich domain [
94]. Interestingly, the HIV retrovirus may form a dGQ to cap the DNA flap sequence produced during the pre-integration phase of reverse transcription, potentially protecting the end in much the same way as proposed for telomeres [
95].
Resolution of G-Quadruplexes
Implicit in the G-flipon cycle is the need to reset flipons to a resting state. As shown in
Figure 3, many helicases enable this outcome. The most studied example is the ATP dependent DEAH box SF2 helicase DDX36 (RHAU), a highly specific GQ resolvase that unwinds parallel dGQ. The enzyme makes helical contacts with the GQ end plate [
96,
97]. Binding by the helix alone has a relatively high K
d of 1 μM. The additional engagement of a 3' single-stranded dGQ tail by other residues accounts for the nM affinity of the enzyme for its substrate. Using a ratchet mechanism, the helicase disassembles the dGQ, one guanine at a time. The chemical energy derived from ATP is converted into a pulling force by rotation of the C-terminal domain. The twist provides access to the helicase core [
97]. In the absence of nucleotide, or in the ADP bound state,
D. melanogaster DDX36 stabilizes the GQ [
98].
The cocrystal structure of dGQ with the SF1
Thermus oshimai 5′-3′ Pif1 helicase shows the enzyme in an unwinding state with engagement of a single-stranded thymine repeat [
99]. The related yeast helicases
PiF1 and
Rrm3 cooperate to unfold a wide range of dGQ topologies, including those formed not only by telomeres, but also by centromeres and tRNA repeat sequences [
100,
101]. The enzyme unfolds dGQ in an ATP-dependent manner, unwinding both parallel and antiparallel dGQ [
99]. The interaction of the Pif1 with the parallel stranded dGQ differs from that of DDX36. The contact is mediated by a cluster of amino acids, including two arginine/lysine cation-π interactions at either end of the dGQ, plus ionic contacts with the phosphate backbone. The SF2 RecQ BLM helicase also can unfold a range of dGQ folds through a variety of different mechanisms [
102]. Collectively, the helicases play key but distinct roles in flipping dGQ back to the B-DNA conformation.
G-Quadruplexes and Gene Expression
The SANT domain, and gene expression. The widely held assumption is that a crystal structure of a protein engaged with B-DNA precludes an interaction with any other DNA conformation, especially if the substrate is bound with nM affinity. Of course, crystal structures by their nature represent a low energy state. The example of Rap1 is therefore instructive (
Figure 2). Prior to its role in telomere protection, Rap1 was characterized as a sequence-specific transcription factor that bound to a UAS (upstream activating site) in yeast [
103]. The base-specific interaction with B-DNA was confirmed by crystallographic study of a telomeric sequence (
Figure 2A)[
104]. Only later did crystal structures show that Rap1 also docked to GQ. Surprisingly, both DNA interactions involved the same helix, but a different face [
105] (
Figure 2B). The GQ contacts were hydrophobic, with the helix lying on the planar surface of the terminal tetrad, while the B-DNA contacts were consistent with those found for the UAS. Both interactions have a Kd≈20-30 nM [
105], yielding a switch that has two stable states (
Figure 2C). The switch state then depends on the context and the availability of helicases. The example illustrates the potential of flipons to switch the readout of genetic information from a genome by changes to their conformation [
106].
While this finding might seem anomalous, many subsequent studies have demonstrated the ability of proteins to bind specifically to a cognate B-DNA sequence, and also to a GQ. In both cases, the affinity is often nanomolar. This finding is true for binding of the SP1 transcription factor to the c-MYC parallel GQ [
107] and for a range of other proteins that bind GQ and a B-DNA motif[
21]. Interestingly, like Rap1, many of the GQ binding proteins include a SANT/Myb domain such as ZRF1 [
108] and TRF2 [
109,
110]. Interestingly, the yeast Zuo1 protein has replaced the SANT domain with a highly hydrophobic helix that could well interact with the endplate of a GQ [
108]. SANT domain proteins are found in multiple chromatin-modifying and remodeling complexes, although their interactions with GQ are not yet reported [
111].
GQ and transcription complexes. Given the enrichment of G-flipons in promoters, a key question was how do the GQ stabilizing and resolving proteins impact transcription. GQ binding proteins like YY1 (Yin Yang 1) are known to form homodimers that promote enhancer-promoter contacts [
51,
112,
113]. So do transcription factors that bind GQ. One of the surprises of the ENCODE project was the identification of HOT (high occupancy target) loci where upwards of a 100 TF bound, even to sites lacking their sequence-specific binding motif. The findings were initially dismissed as methodological artefacts [
114], but were later shown not to be so [
115,
116]. The primary studies focused on the sequence-specificity of TFs, not the GQs that were also formed at promoters. The ability of TF to bind both B-DNA and GQ offered a resolution to this HOT dilemma [
51]. Indeed, recent findings suggest that it is GQ formation that recruits TF to transcriptional hubs [
117]. In this new model, as described here, TFs play a different role. Through the complexes they anchor, TF localize helicases to resolve the GQ formed by promoters. A specific helicase might recognize a particular GQ fold, a GQ loop of particular length or composition, or display a preference for a 5' or 3' single-stranded flanking sequence. The biological outcomes then depend on the GQ topology and the helicase involved. The model explains the diversity of functions enabled by the G-flipon cycle (
Figure 3).
G-Quadruplexes and Transcriptional Bursts
One extension of this model is that docking of TF to GQ maintains a transcription state following its initiation by the binding of a sequence-specific TF to B-DNA. Consequently, there would be no need for any further sequence-specific interactions with the promoter. However, this possibility is not consistent with the observed rapid reset of promoters that occurs after each round of transcription [
118,
119]. The fast disassembly of the transcriptional complexes following each round of transcription is mirrored by the abrupt dissolution of promoter condensates triggered by the high levels of nascent RNAs produced [
120]. The evidence suggests that transcription occurs in bursts followed by a reset rather than by a preset level of expression.
Earlier experiments based on single molecule FISH suggest that the transcriptional burst frequency, but not the burst size, depends on the rate of promoter reset [
118]. One contribution to burst size is the frequency with which sister chromatids are transcribed. Curiously, only one allele is active at a time, rather than both undergoing simultaneous transcription [
118]. The localization of many different helicases to the locus may allow one allele to reload a sequence-specific TF to reform an initiation complex while the other one fires. Such coordinated activity is consistent with the symmetrical chromatin architecture observed for sister chromatids, as described above [
82]. The lack of co-bursting by maternal and paternal chromosomes is consistent with recent single cell studies of allele-specific transcription [
125].
Gene Repression
GQ and Gene Repression. The promoter reset occurs in competition with complexes that suppress gene expression. These competitors include the PRC2 complex that engages the GQ formed at promoters through the SANT domain of the EZH2 (enhancer of zeste 2) component. For active genes, binding of PRC2 the GQ formed by a nascent RNA likely prevents engagement of the GQ formed by the single-stranded promoter DNA [
136]. However, in other situations, binding of a small RNA to the coding strand would promote GQ formation by the promoter DNA without the transcription of a GQ RNA competitor. In this situation, proteins, such as PRC2, that are localized to the site by the small RNA, would enhance formation of a repressive complex at the promoter. In these situations, the small RNA could be produced from a locus elsewhere in the genome [
130]. Indeed, the small RNAs direct the
hiwi (human ortholog of
piwi) mediated repression of human endogenous retroelements in early development are produced from over 6000 clusters [
137,
138,
139]. By localizing a different set of proteins to the site, small RNAs acting
in trans could also promote transcriptional activation (
Figure 4A). Such a role has been proposed for the other
piwi-related agonaute family member complexes [
140,
141].
R Loop Resolution
A number of mechanisms exist to regulate dGQ formation by R-loops (
Figure 3). For example, helicases such as SETX, and RTEL1 can facilitate the flip of GQ back to B-DNA through the resolution of RNA:DNA hybrids [
142,
143]. Nucleases that digest the RNA strand of hybrids, such as RNaseH1, play an important role in their removal [
144]. Other proteins such as ATRX prevent R-loop formation at telomeres by sequestering RNA. Deletion of ATRX leads to increased formation of GQ at telomeres [
145].
Transactional Chromatin Looping and Transcript elongation
In cellulo studies reveal that delays in RNAP2 transcript elongation occur at the CTCF binding sites involved in chromatin loop formation. CCTF binds to the large subunit of RNAP2 and the interaction is also associated with cohesin recruitment [
146,
147,
148]. Conversely, CTCF binding to DNA increases, following deletion of the DNA methylase DNMT1.
These findings are consistent with a model where stalling of the polymerase by CTCF results in an R-loop that promotes GQ formation at the site. The GQ structure produced then inhibits DNMT1, preventing DNA methylation of the locus by trapping the enzyme. The trap works as the binding affinity of DNMT1 is higher for GQ than to either duplex, hemi-methylated or single-stranded DNA [
149]. The resolution of the GQ by helicases then allows redocking of CTCF to the original DNA site, leading to reinstatement of the chromatin loop formed with the promoter (
Figure 4). The CTCF binding sites necessary for this transaction lie in reverse orientation to each other. They are then fully aligned at the base of the loop and held in that state until the next splicing event [
83]. After the splicing complex is assembled, the flipon cycle then resets the DNA locus to await splicing of the next transcript.
DNA G-Quadruplexes and Splicing
How GQ formation by DNA affects splicing is therefore of considerable interest. Pausing of RNAP2 is associated with alternative splicing (reviewed in [
150]). The sites at which RNAP2 pauses have been investigated at nucleotide resolution. Careful in vivo measurements show dependence of pause sites on the structure of the RNA:DNA hybrid produced, but not on the canonical DNA motifs that form GQ [
151]. The lack of direct involvement of dGQ may reflect the action of the FACT (Facilitates Chromatin Transcription) complex in maintaining the existing epigenetic state by removing nucleosomes in front of the RNAP2 and replacing them behind the enzyme. This mechanism prevents the net accumulation of local DNA supercoiling that might otherwise change flipon conformation[
152]..
However, CTCF mediated looping is associated with alternative splicing and may allow dGQ to play an indirect role in splicing by maintaining CTCF sites methylation free. The role for CTCF is well substantiated. There is evidence that the DNA loops formed between promoter and the spliceosome mediate the transfer of various splicing factors that initially accumulate in promoter regions [
153,
154]. There is also ancillary evidence that R-loop formation at promoter sites promotes splicing [
155], consistent with a role for GQ in forming promoter/spliceosome condensates.
Alternative splicing is also associated with demethylated DNA, consistent with a role of CTCF anchored loops in splicing. The deletion of DNMT1 enhances the alternative splicing of the CD45 transcript, as does inducing DNA demethylation by increasing expression of TET1 (tet methylcytosine dioxygenase 1) and TET2 enzymes [
156,
157].. Interestingly, the complement of the degenerate RPOL2 pause motif given by Gajos et al, has a weak match to a CTCF motif (the orientation is inverted relative to those enriched at TSS). In this case, the inhibition of DNA methylation by GQ may provide a partial explanation for how this conformation can indirectly influence the selection of splice sites [
40].
The CTCF-dependent mechanism of connecting promoters with RNA processing condensates involved in splicing is quite flexible. For example, the multiple alternative splices of the protocadherin
Pcdh gene family connect the production of each isoform with a different active promoter [
158,
159]. Similar dependence on promoter selection is reported for other RNA processing steps in which the polyadenylation of transcripts occurs at different sites [
160,
161] (
Figure 4). In both outcomes, GQs potentially prevent the loss of CTCF binding sites by inhibiting DNA methylation of the locus. The GQ also localizes proteins with roles in the splicing and polyadenylation. The many proposed GQ binding proteins involved are listed in [
60], in the G4IPBD database and QUADRatlas databases, with a validated subset given in [51,He, 2023 #3109]).
RNA G-Quadruplexes and Splicing
rGQ can also form in the RNA transcripts produced, including those with only two tetrads [
34] and those folded with non-contiguous G nucleotides [
48]. These structures have the potential to alter RPOL2 elongation rate and the RNA processing performed [
40,
41].}. For example, the splicing factors U2AF65 and SRSF1 bind to GQ RNA with nanomolar affinity, each showing specificity for different GQ substrates [
162]. The small molecule cephaeline and the related compound emetine are both reported to impair the formation of GQ by RNA. Both compounds globally disrupt alternative RNA splicing [
163].
GQ formation may also alter the co-transcriptional N6-methyladenosine (m6A) modification of RNA. It has been proposed that this epigenetic mark can affect splice site selection, but that issue is unresolved [
164,
165,
166]. The involvement of rGQ in m6A modification is also controversial. Interestingly, the methyltransferase METTL3/METTL14 heterodimer that writes m6A within the consensus DRACH motif (D = A, G, or U; R = A or G; H = A, C, or U) binds to rG4 structures preferentially through its RGG domain [
167,
168]. Also, the RBM15 protein that also binds rG4 localizes METl3 to certain transcripts and to a subset of H3K36me3 marks [
51,
166,
169]. The mapping of GQ and m6A to splice junctions is dependent on the methods used. Over 81% of GQ that map in HeLa cells are formed from only 2 tetrads that can stably fold into rGQ [
164]. The mapping frequency also depends on the m6A detection protocol employed and the cell line studied, varying from 14% in HeLa cells to 40% in HEK cells [
164]. More recent methods are even more sensitive than those used in the earlier analysis, but reproducibility across studies remains a problem [
170]. Current mappings do not reveal any enrichment of the DRACH motif in GQ loops, suggesting that rGQ might localize METl3 to modify sequences in their neighborhood [
164]. Alternatively, m6A modification may inhibit rGQ formation, as seen for GGA repeats [
171]. Interestingly, m6A bases are read by heterogeneous ribonucleic acid proteins (hnRNPs) involved in alternative splicing, such as hnRNP C and hnRNP A2B1 [172{Ye, 2024 #3182].
The role of m6A in splicing was also investigated in genetically modified animals. The expression of a hypomorphic METTL3 allele in mouse embryonic stem cells did not appear to change splicing patterns, although there was slower turnover of many of the wildtype m6A modified RNAs [
165], Further, in wildtype cells, the distribution of m6A in processed nuclear mRNAs was similar to that found in cytoplasmic mRNAs. Around 70% of the observed m6A sites were in terminal exons, with ~70% in the 3' UTR. With chromatin associated RNAs that were not completely processed, ∼93% of the m6As in the partially spliced transcripts were in exons and only ~10% of m6As were within 50 nucleotides of 5' or 3' splice sites. Notably, methylation was mostly performed before splicing [
173].
Rather than working with a genomic knockout, another group examined the immediate effects of acute depletion of METTL3 protein. This approach was designed to minimize the downstream effects on the expression of other genes resulting from METTL3 loss. Around 6%–10% of high-confidence m6A regions were mapped to introns, mainly in protein coding genes, either around stop-codon regions or at the beginning of the 3′ UTR. The loss of METTL3 disrupted inclusion of alternative introns/exons in the nascent transcriptome, particularly at those 5' splice sites proximal to m6A peaks, suggesting that the sites were occluded or the isoforms were protected by proteins bound to m6A. Among those genes showing altered splicing were those encoding proteins for m6A modification (
Wtap, Ythdc1, Ythdf1, and Spen), suggesting a negative feedback regulatory mechanism that would be absent in cells with METTL3 deleted from the germline [
166]. Overall, the different results for GQ RNA formation at splice sites and METTL3 deficiency are consistent with a model where rGQ folding in introns can promote m6A modification of exons, with rapid degradation of splicing isoforms with retained introns marked by m6A.
G-Quadruplexes and Translation
GQ and ribosome assembly. rGQs appear to play an important role in ribosome structure and maturation, with ribosomal RNAs enriched for G-flipons [
174]. Many ribosomal proteins have been identified as rGQ ligands in different screens [
62,
162]. Further, rGQ binding and resolving proteins such as nucleolin and nucleophosmin help structure the nucleolar condensates that guide ribosome assembly [
56,
175,
176,
177].
rGQ and translation. rGQ formation by mRNA is the subject of much interest, especially in the untranslated regions that regulate translation. These exons contain alternative translation initiation sites and microRNA (miR) binding sites that affect the production of different protein isoforms. The complexities involved are described in a number of recent reviews. The articles provide examples of how rGQ in the 5’UTRs can switch the use of start codons to produce completely different protein products, which rGQ in the 3’UTR can modulate the translation of mRNAs and interactions with small regulatory RNAs such as miRNA [
178,
179,
180,
181]. Analysis of G-flipons in 5'- and 3'UTR provides evidence of positive selection, which can alter the alternative splicing of these exons. Single nucleotide variants in both 5'- and 3'UTR are associated with quantitative trait loci [
182]. Bioinformatic approaches have also been used to identify G-flipon RNA binding protein, as annotated in the QUADRatlas database.
By modulating mRNA translation RNAs, rGQs contribute in many ways to phenotypic pliability [
28]. Here helicases such as DHX36 and CCHC-type zinc finger nucleic acid-binding protein (CNBP/ZNF9 play a central role in promoting mRNA translation by resolving rGQ [
183,
184]. The m6A modifications of RNA that are associated with rGQ formation during transcription (as described above) also impact translation. The removal of these marks from the 5' UTR near the start codon by the m6A erasers AlkB homolog H5 (
ALKBH5) and fat mass and obesity (
FTO) decreases ribosome translational pausing, increasing protein synthesis [
185]. Such m6A modifications also dynamically regulate heat shock responses by enhancing N7-methylguanosine cap-independent translation [
186]. Further, the class I cytoplasmic m6A readers, YTHDF1 and YTHDF3, promote the degradation of target transcripts [
187], potentially eliminating partially processed transcripts with retained introns. The endogenous repeat elements present in these introns, such as ALU SINE inverted repeats, might otherwise activate dsRNA and Z-RNA dependent immune responses [
132]. The potential of rGQ to enhance m6A modifications provides additional mechanistic insight into how G-flipons increase phenotype pliability by regulating RNA dependent epigenetic outcomes.
G-Quadruplexes and Development.
Pioneering Factors. Other mechanisms exist for the induction of alternative flipon conformations. Sequence-specific pioneering transcription factors, such as HNF4 and GATA4, can dock to their motifs on nucleosome bound DNA. The master regulators of embryonic development then localize complexes that evict histone octamers from the locus, generating a negatively supercoiled NDR at the site [
188,
189]. The energy released by removal of a nucleosome is sufficient to induce a number of different alternative DNA conformations [
190]. The relaxation of these structures to B-DNA is sufficient to power the assembly of the different biological machines that actuate alternative cellular responses (
Figure 3).
GQs are able to facilitate a number of different processes in the cell that are directed by sequence-specific TF. Small noncoding RNAs, such as those used in the
piwi system to regulate endogenous retroelements [
191], provide another means by which GQ formation can be regulated in a sequence-specific manner. In both cases, the alternative flipon conformations engage the same structure-specific cellular machinery. The question arises as to two these two different systems for sequence-specific regulation of gene expression and RNA translation are used to coordinate development, especially during early embryogenesis. To explore the role of small RNAs in this process, the sequence-specific match between experimentally confirmed flipons and miR highly conserved in eutherian mammals was explored. Intriguingly, promoters with miR matches to G- and Z-flipons were highly enriched in developmental genes (FDR > 10
-100), consistent with a role in early development [
130].
Notably, GQ are enriched in human embryonic stem cells (hESC). About 18,000 GQ were mapped to NDR as defined by ATAC seq. Following differentiation into neural stem cells and cranial neural crest cells, the number of detectable GQ was reduced by 25-50%, with findings differing by lineage [
192]. In hESC, GQ were mapped to ~50% of bivalent promoters that contain both active H3K4me1 and repressive H3K27me3 marks and are lowly transcribed. The GQ in hESC overlapped sites bound by CTCF (~36 %), the cohesin component RAD51 (~50%) and RING1B that mediates repression by recruiting PRC1 to R-loops (~55%) [
193]. Differentiation was associated with the loss of bivalent promoters reflecting the potential of GQ to localize either activating or repressive protein complexes during lineage specification. Collectively the results are consistent with a model where small RNAs bootstrap development, much in the same way a computer loads an initial program to specify the inputs and outputs that are necessary for an operating system to run. Here, the programming of flipon conformation by small RNAs would establish epigenetic marks to template tissue differentiation by sequence-specific B-DNA binding proteins. The bootstrapping by small RNAs that occurs after the erasure of existing parental epigenetic marks early in development could potentially involve miR transmitted by either maternal or paternal gametes [
194,
195,
196,
197]. Further research is needed to address such mechanisms.
Summary and Outlook
Flipons are genetically encoded elements that dynamically change their conformation under physiological conditions without requiring strand cleavage or a change in sequence. They vary by the non-B-DNA structure they form. Z-flipons flip rapidly, with an in vitro relaxation time of 100 ms and have ancient, well documented roles in self-recognition and immunity through the structure-specific interaction with Zα domain [
132]. G-flipons are much more stable, with higher melting temperatures than their B-DNA structure. Yet, like Z-flipons, GQ are formed and resolved dynamically to perform a number of important biological roles (
Figure 3). Flipons that form triplexes are also likely to influence gene expression and development [
198,
199], with examples related to the hemoglobin locus [
200], stabilization of by histone H3 tails [
201,
202], and by binding of the Drosophila GAGA protein triplex-DNA through the same domain that engages B-DNA in a sequence-specific manner [
203]. Triplex forming sequences are also enriched in repeat elements, such as ALU SINEs (short interspersed nuclear element) that form part of the repetitive genome [
106]. Their biology may reflect the RNA motifs they deliver to a locus that engage both sequence- and structure-specific proteins that scaffold formation of various chromatin modifying complexes [
204].
Based on a dynamic form of encoding, flipon biology can be best visualized as a cycle that exchanges energy for information. The flip to an alternative conformation is regulated both genetically and by environmental events, by base modifications that enhance or suppress the transition and depend upon proteins and noncoding RNAs that modulate the formation or resolution of the alternative conformation. These modulators are also subject to modification to tune the cycle. Other factors also affect the equilibrium by binding in a sequence-specific manner to the right-handed B-DNA conformations or to single-stranded RNA. While it has been usual to consider the effects of evolution on the individual components involved in cellular processes, the optimization of so many different parameters represents a combinatorically challenging calculation full of cascading complexity, similar in logic to the epicycles once used to predict planetary orbits in a bygone era. Instead, flipons offer a simpler alternative to optimize context-specific responses that allow rapid adjustments of cellular state. By programming and refreshing epigenetic state, flipons facilitate the formation and maintenance of cellular memory [
2]. Here, the various ways in which G-flipons impact a wide variety of biological processes is described, with a focus on recent experimental validation of GQ and descriptions of the current unknowns.