1. Introduction
DNA replication is a fundamental process in all living organisms to accurately and timely duplicate the genetic information of their cells before dividing into two genetically identical daughter cells [
1,
2,
3,
4,
5,
6,
7,
8,
9]. In eukaryotes, DNA replication takes place in a specific time window of a cell division cycle called S phase or synthesis phase [
10,
11]. Replication of eukaryotic chromosomal DNA requires tight regulations to ensure that the genome of each cell is duplicated once and only once per cell cycle cells [
5,
6,
8,
12,
13]. Failures to do so can have detrimental results and lead to diseases such as cancer. DNA replication is a highly conserved process at molecular levels in all eukaryotes [
1,
2,
3,
4,
5,
6,
8,
9]. However, major differences between human, yeast, and other eukaryotes exist and have been studied [
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. For example, origins of replication have specific sequence requirements in some yeast such as
Saccharomyces cerevisiae, whereas in most other eukaryotes replication origins are defined by protein-DNA complexes with little or no dependence on DNA sequence [
15]. Moreover, the regulation of the origin recognition by the conserved protein complex called origin recognition complex (ORC) differs between eukaryotes. Yeast ORC binds to replication origins constitutively whereas in other organisms one subunit of ORC is destructed in a cell cycle-dependent manner [
16,
17,
18,
19,
20,
21,
29,
30]. Additionally, although the replicative helicase, the CDC45-MCM2-7-GINS (CMG) complex, is conserved from yeast to man its assembly and activation at replication origin requires several factors. In yeast, DNA polymerase ε (Pol ε) and MCM10 support its assembly and activation whereas in multicellular organisms the protein Donson, a protein not yet found in
S. cerevisiae, assists the assembly of the CMG complex [
28]. Interestingly, mutations in Donson, MCM2-7, CDC45, and in other replication initiation factors have been found to be the cause of genetic diseases such as Meier-Gorlin syndrome highlighting the importance to understand the mechanism of initiation in humans [
9,
22,
24,
25,
26,
27,
28].
2. Establishing DNA Replication Forks—Origins of DNA Replication and Chromatin-Loading of Replication Proteins
The DNA replication process is divided into three separate steps: initiation, elongation, and termination [
1,
2,
3,
8]. The initiation step yields the unwinding of the double-stranded DNA and the formation of fundamental structures, called replication forks (RFs), which are Y-shaped DNA structures representing the two templates on which primase synthesises the initiator RNAs (ribonucleic acids) and the replicative DNA polymerases synthesise new DNAs (deoxyribonucleic acid) during the elongation step. When two opposing RFs from different origins meet the DNA synthesis ends (termination) [
1,
2,
3]. In the current review, we will focus on the initiation reactions, which are highly regulated processes in all living organisms.
The process of replication origin activation during the initiation phase of DNA replication and the RF establishment have been best studied in yeast and
Xenopus laevis, where biochemical replication system either with purified proteins or using
Xenopus oocyte extracts have been established and are supported by genetic studies, especially in yeast. The yeast biochemical DNA replication system uses double-stranded DNA (dsDNA) containing an origin of DNA replication whereas the
Xenopus laevis system, which is based on embryonic proteins, works with long stretches of DNA devoid of specific origin sequences and the established origins are spaced ~15 kB away [
14,
15,
31]. Recent studies using synthesised, pre-assembled RFs and purified proteins have advanced the field in the elongation phase and can be combined with cell biological studies and advanced modern technologies including next generation sequencing as well as structural biology techniques [
32,
33,
34,
35,
36,
37,
38,
39]. Although origin-dependent replication initiation processes in higher eukaryotes are not as well understood as in yeast, combining results of multiple model systems have yielded a picture of highly conserved processes in combination with subtle changes as discussed below (summarised in
Figure 1, [
14,
15,
16,
17,
18,
19,
20,
21,
22,
24,
25,
26,
27,
28]).
All DNA replication processes start at specific sequences in eukaryotic genomes named origins of DNA replication. These replication origins are well defined in the yeast
S. cerevisiae and closely related yeasts. They are AT-rich and are show sequence-specific requirements ([
15]
Figure 1). In contrast, origins in other yeasts such as
Schizosaccharomyces pombe (
S. pombe) and higher multicellular organisms are less well defined and show less defined requirements. Origins of
S. pombe and related yeasts preferentially contain AT-rich sequences and the origin-binding complex (ORC) with the AT-hook in ORC4 binds to the AT-rich sequence but otherwise there are no sequence specificities known which define those origins. In Metazoan, on the other hand, origins are GC-rich, they are epigenetically defined, and often associated with promoter regions [
15]. However, the conserved six-subunit protein complex ORC, first described in yeast, binds to all eukaryotic origins, serves as a landing platform for replication initiation factors, and allows the origin-dependent initiation process to take place [
15,
40]. In somatic cells of Metazoan these origins are spaced far apart, ~50-100 kilobases (kba), whereas, in embryonic cells, origins are spaced as close as 15 kba from each other [
14]. In summary, replication origins are chromosomal DNA sequences, which ORC recognises in an ATP-dependent manner, and which form the starting point of DNA replication or for Metazoan the DNA replication-starting region [
15].
Figure 1.
The initiation process at eukaryotic origins of DNA replication. At the end of mitosis and in early G1 phase of the cell cycle, ORC, the origin-recognition complex, binds to eukaryotic origins of replication together with CDC6. In G1, CTD1 together with the MCM2-7 complex binds then the CDC6-ORC complex and the MCM2-7 proteins are loaded as helicase-inactive double hexamers (MCM-DHs) onto the chromatin forming the pre-replicative complex (Pre-RC) and license the origin. Modification of CDC6 and binding of CTD1 to Geminin inactivates the loading activity of these proteins with CDC6 binding degraded similarly as free CTD1. In the next step, Treslin-MTBP (SLD3-SLD7 in yeast) interacts with MCM-DHs at the chromatin, and the DBF4/DRF1 CDC7 kinase (DDK) phosphorylates the MCM2-7 proteins. The DDK-dependent phosphorylation can be reversed by RIF1-PP1 making this step reversible. Next, Cyclin-CDKs phosphorylate Treslin and stimulate the formation of Donson-TOPBP1 complexes, which in turn bind to MCM-DHs. Donson-TOPBP1 supports the loading of the GINS complex and its association with MCM-DHs. Binding of CDC45 leads to the formation of the CMG complex and its activation, whereas TOPBP1 and Treslin-MTBP are released from the chromatin. CryoEM data suggest that Donson associates as dimer with CMG, but only one Donson subunit binds to GINS and MCM2-7 proteins stabilising the CMG complex [
28]. In the following, two replication forks (RFs) are formed and replication protein A (RPA) with the help of the CDC45 binds and stabilises the resulting ssDNA. The association of AND-1/CTF4/WDHD1 (shown as AND-1 in the diagram) with CMG allows the loading of an inactive Pol α (dark green) including its primase subunits to RFs. The activation of Pol α (light green) permits the primase subunit PRIM1/PRI1 with help of PRIM2/PRI2 and additional replication factors to synthesise the first RNA primer in origin sequences resulting in the completion of the initiation process at origins and the start of the elongation phase. Additional proteins associated with RFs such as the fork stabilising proteins Timeless, Tipin and Claspin plus Pol ε [
3,
41,
42] were omitted in the diagram for simplification and clarity reasons providing a better overview. Adapted from [
3,
26,
27,
28,
43,
44,
45] and
created with BioRender.com.
Figure 1.
The initiation process at eukaryotic origins of DNA replication. At the end of mitosis and in early G1 phase of the cell cycle, ORC, the origin-recognition complex, binds to eukaryotic origins of replication together with CDC6. In G1, CTD1 together with the MCM2-7 complex binds then the CDC6-ORC complex and the MCM2-7 proteins are loaded as helicase-inactive double hexamers (MCM-DHs) onto the chromatin forming the pre-replicative complex (Pre-RC) and license the origin. Modification of CDC6 and binding of CTD1 to Geminin inactivates the loading activity of these proteins with CDC6 binding degraded similarly as free CTD1. In the next step, Treslin-MTBP (SLD3-SLD7 in yeast) interacts with MCM-DHs at the chromatin, and the DBF4/DRF1 CDC7 kinase (DDK) phosphorylates the MCM2-7 proteins. The DDK-dependent phosphorylation can be reversed by RIF1-PP1 making this step reversible. Next, Cyclin-CDKs phosphorylate Treslin and stimulate the formation of Donson-TOPBP1 complexes, which in turn bind to MCM-DHs. Donson-TOPBP1 supports the loading of the GINS complex and its association with MCM-DHs. Binding of CDC45 leads to the formation of the CMG complex and its activation, whereas TOPBP1 and Treslin-MTBP are released from the chromatin. CryoEM data suggest that Donson associates as dimer with CMG, but only one Donson subunit binds to GINS and MCM2-7 proteins stabilising the CMG complex [
28]. In the following, two replication forks (RFs) are formed and replication protein A (RPA) with the help of the CDC45 binds and stabilises the resulting ssDNA. The association of AND-1/CTF4/WDHD1 (shown as AND-1 in the diagram) with CMG allows the loading of an inactive Pol α (dark green) including its primase subunits to RFs. The activation of Pol α (light green) permits the primase subunit PRIM1/PRI1 with help of PRIM2/PRI2 and additional replication factors to synthesise the first RNA primer in origin sequences resulting in the completion of the initiation process at origins and the start of the elongation phase. Additional proteins associated with RFs such as the fork stabilising proteins Timeless, Tipin and Claspin plus Pol ε [
3,
41,
42] were omitted in the diagram for simplification and clarity reasons providing a better overview. Adapted from [
3,
26,
27,
28,
43,
44,
45] and
created with BioRender.com.
During the initiation process of eukaryotes, these ORC-DNA complexes serve as landing platforms for additional proteins called CDC6 and CTD1-MCM2-7 in G1 phase of the cell cycle when the activity of cyclin-dependent kinases (CDKs) is low [
3,
9,
30,
46,
47,
48,
49]. Here, MCM2-7 hexamers bind to the origin chromatin forming a head-to-head dimer, MCM2-7 double hexamer (MCM-DH), on the DNA [
50,
51]. These proteins form the so-called Pre-RC complex, which is first step of the formation and activation of the replicative DNA helicase containing MCM2-7 as a core, and results in the licensing of replication origins [
3,
12,
52]. In the next step, CDC6 and CTD1 leave the complex and are inactivated e.g., depending on the organism, CDC6 is phosphorylated and degraded. In contrast, CTD1 binds Geminin and forms a stable, inactive CTD1-Geminin complex, whereas free CTD1 is proteolytically degraded [
53,
54]. In some organisms including human, CTD1 is completely proteolytically degraded in a ubiquitin-dependent manner in late G1/early S phase [
55]. In the following, the proteins Treslin and MTBP, named SLD3 and SLD5 in yeast, bind to MCM-DHs [
2,
3,
55]. Then DBF4/DRF1-CDC7 kinase (DDK) phosphorylates MCM-DHs preparing the latter for the next step. Cyclin-CDK complexes e.g., CycE-CDK2, phosphorylate Treslin in complex with MCM-DHs and stimulate Donson-TOPBP1 complex formation. TOPBP1 has Dpb11 as its equivalent in budding yeast, whereas the Donson-equivalent protein has not yet been found in yeast [
28]. In yeast, MCM10 and/or Pol ε may take over some functions of Donson during CMG formation [
37,
50,
56,
57]. Then TOPBP1 binds to the GINS complex and P-Treslin bringing the GINS proteins to the MCM-DHs located at origin sequences. This complex formation allows CDC45 to associate with GINS and MCM2-7 forming the CMG complex, the eukaryotic replicative helicase [
22,
24,
25,
26,
27,
28]. Next, TOPBP1 and the Treslin/MTBP complex leave the CMG complexes but two Donson molecules remain associated with each active CMG complex [
22,
28]. These activated helicase complexes start to untwist and unwind dsDNA establishing replication bubbles in origins and move in opposite directions forming two RFs at the active origins [
3,
50]. To stabilise the ssDNA and protect the ssDNA against nucleases, CMG complexes load the eukaryotic ssDNA-binding protein, replication protein A (RPA), a heterotrimer consisting of RPA70, RPA32 and RPA14, onto the unwound ssDNA strands [
58,
59,
60,
61]. In the following, Pol α associates with the CMG complex via the AND-1/CTF4/WDGD homotrimer and in turn the primase function of Pol α forms RNA primers on ssDNA templates at origins, the start of eukaryotic DNA replication [
1,
9,
45,
60,
62,
63].
3. The Initiation of DNA Synthesis at Origins
The binding of the MCM2-7 complexes and their activation are well understood as summarised above but the transition from the dsDNA binding modus of MCM2-7 proteins and the CMG to the unwinding of the dsDNA origin to form stretches of ssDNA bound by RPA is less well understood. However, recently cryoEM studies have revealed some insights into this process in human and yeast [
50,
64]. After the loading of MCM-DHs on origin dsDNA in an ATP hydrolysis-dependent manner, a stable ADP-MCM2-7 complex is formed, and the ‘MCM6 wedge’ (N-terminal pore loop in Mcm6) interacts with dsDNA at MCM-DH interface and an initial open structure at the hexamer junction is generated [
50,
64]. In the following, the formation of Pre-IC (pre-initiation complex) occurs which includes DDK and CDK phosphorylation as well as the association of CDC45 and GINS resulting in CMG complex formation (
Figure 1, [
50]). The ATP-hydrolysis-depending on opening and untwisting of dsDNA follows a mechanism of pushing of one strand and pulling of the other strand by MCM2 components plus interactions of the MCM6 wedge with the partially single-stranded bases allowing the first base pairs in the dsDNA to be broken [
50,
64]. Next, a putative further rotation of the DNA-bound CMG and the reconfiguration of the CMG-DNA complex with the help of the MCM10 protein plus further ATP hydrolysis, two RFs with Pol ε-CMG complexes as the replicative helicase complex on the leading strands are formed in budding yeast [
50]. Then, like SV40 T antigen in the viral DNA replication, CDC45 of the CMG complex loads RPA on the emerging ssDNA, and these ssDNA sequences are further extended with the help of RPA and the two CMG molecules move on ssDNA in 3′-5′ direction yielding two RFs [
58,
60,
65,
66].
In the next step, primase consisting of PRIM1/PRI1, the catalytic subunit [
67], and PRIM2/PRI2, the regulatory subunit [
68,
69], is recruited to the RF as part of the four-subunit Pol α complex via a AND-1/CTF4/WDHD1 trimer. The latter associates with the CMG complex at RFs (
Figure 1 and
Figure 2, [
1,
45,
63,
70]). On the ssDNA template, primase synthesises short RNA primers on the CMG-RPA-bound ssDNA templates initiating the leading strand DNA synthesis (
Figure 2, [
43,
60,
71,
72,
73,
74]). Next, a protein complex involving RPA, PRIM2/PRI2-CT, and other partners of the initiation complex hand over the RNA annealed to the ssDNA template to the catalytic centre of PolA1 of Pol α, which then elongates the primer with a short stretch of DNA (additional ~20 nts,
Figure 2, [
1,
43,
60,
75]). After synthesising this short DNA sequence, Pol α leaves the template, RFC-PCNA binds this RNA-DNA primer, and associates with DNA polymerase δ (Pol δ). Pol δ is a four-subunit protein complex with the largest subunit p125/Pol3 having DNA polymerase activity, which elongates the RNA-DNA. The p125/Pol3 subunit, different to Pol α-p180, also contains an active 3′-5′ proofreading exonuclease function [
1,
76,
77,
78]. Then, in the model called ‘division of labour’, during leading strand DNA synthesis, Pol δ finally hands the DNA molecule over to DNA polymerase ε (Pol ε), which has a high intrinsic processivity and can synthesise the leading strand DNA as long as a full replicon in one go, associates with the PCNA-bound DNA and elongates it. Pol ε also consists of four subunits, of which the largest subunit PolE1/Pol2 has DNA polymerase and a proofreading 3′-5′ exonuclease to correct falsely incorporated nucleotides in the leading strand [
1,
76]. Although Pol ε can synthesise a full replicon size worth of DNA in one processivity cycle, under exceptional circumstances the DNA synthesis will stall e.g., when Pol ε encounters a DNA lesion, and replication fork stabilisation mechanisms will be activated as discussed below (
Section 5, [
42,
72,
73,
79]).
4. Initiation Processes at DNA replication forks—Okazaki Fragment Synthesis
During the initiation of DNA replication at origins of DNA replication, the CMG helicase produces two RFs with each having two ssDNA templates [
1,
43,
81]. The antiparallel characteristics of the two strands yield that at each RF, one ssDNA template has a 3′-5′ orientation and serves as template of the leading strand synthesis, which is continuous as described above. In contrast, the second strand has a 5′-3′ direction and does not allow continuous DNA synthesis by DNA polymerases. To overcome this problem, the second strand, the lagging strand, is synthesised discontinuously in short fragments with length of 200-300 nucleotides in eukaryotes, called Okazaki fragments. Thus, to replicate the human genome completely, for each cell duplication the cellular replication machinery must initiate, elongate, and maturate ~30 million Okazaki fragments on the lagging strand for complete replication of a human genome [
1,
32,
43].
Recent cryo-EM structure studies of budding yeast and human replisomes as well as of the CST-Pol α complexes have given a profound insight into the mechanism of the initiation of Okazaki fragment synthesis at RFs and telomere sequences [
39,
62,
82,
83]. The RF structures by Jones et al. show that Pol α has multiple interactions with the CMG complex and the AND-1/CTF4/WDHD1 complex (summarised in
Figure 2, [
62]). Here, especially MCM3/P1, and the GINS subunits have direct contacts with the primase subunit PRIM2/PRI2, and to the second largest Pol α subunit PolA2/Pol12 [
62,
84,
85,
86,
87]. Interestingly, most of these interaction sites of the CMG helicase and AND-1 with Pol α are localised within intrinsic disordered regions (IDRs), which have some intrinsic structural flexibility. The sites of these interactions may allow the movement of Pol α on the lagging strand template during primer synthesis due to the flexibility of the IDR structure, whereas CMG advances on the leading strand template unwinding the dsDNA [
43,
45,
62]. Importantly, these protein-protein and CMG-DNA interactions position the catalytic primase subunit of Pol α, PRIM1/PRI1, close to MCM5 and the opening of the exit channel for the lagging-strand template ssDNA [
62]. In the yeast cryo-EM structure, the MCM5 Zn finger has physical contacts with yPRI1 but this interaction is not conserved in the human Pol α-AND-1-CMG structure [
62].
During primer synthesis, PRIM1/PRI1 and PRIM2/PRI2 bind to the ssDNA template to catalyse the formation of the initial dinucleotide, the rate-limiting step of primer synthesis, using two incoming ribonucleoside 5′-triphosphates and divalent metal cations [
67,
68,
70,
88,
89,
90,
91,
92,
93,
94]. The primase regulatory subunit PRIM2/PRI2 consists of two domains, N- and C-terminal domain, PRIM2N/PRI2N and PRIM2C/PRI2C, respectively [
43,
60,
70,
75,
82,
88,
94,
95,
96]. Here, PRIM2N/PRI2N serves as a protein interaction platform and links the catalytic PRIM1/PRI1 subunit to the C-terminal domain of PolA1. The latter serves as a scaffold for Pol α complex formation including binding the second largest subunit PolA2/p70/68/B-subunit, an essential regulatory subunit of Pol α without known enzyme activity, and the primase dimer. After the initial dinucleotide formation, PRIM1/PRI1 elongates the molecule to a size of 7-10 nucleotides in a distributive manner since PRIM1/PRI1 and its active site frequently detache e.g., after three nucleotides, from the newly formed oligoribonucleotides due to weak interactions of PRIM1/PRI1 with di- and tri-nucleotide substrates [
94,
97]. PRIM2C supports the catalytic cycle by functioning as a processivity factor and staying bound to the 5′-end of the primer, allowing the re-association of PRIM1/PRI1 to the 3′-end of the RNA primer to elongate the primer [
68,
70,
88,
91,
94,
95]. However, when the primer reaches a size of ~10 nucleotides further elongation PRIM2C can no longer support the enzyme activity of PRIM1/PRI1 due to a steric clash between PRIM2C and PRIM2N [
70,
88,
95]. Instead, PRIM2C hands over the RNA primer with help of auxiliary mediator proteins such as RPA and CST to PolA1 for further extension [
43,
60,
70,
75,
82,
83,
94,
95,
96]. For this to happen, the primer must be at least 7 nucleotides in length as PolA1 cannot associate with shorter primers. When the primer is at least 7 nucleotides long, a competition for primer binding between PRIM1/PRI1 and PolA1 occurs [
94]. PolA1 has a higher affinity for primers of such a size making its binding more favourable, enabling the hand-over. PolA1 adds then ~20 deoxyribonucleoside monophosphate (dNMP) molecules to the primer creating an RNA-DNA hybrid primer with a size of ~30 nucleotides [
92,
96,
97,
98].
In the next step, the RNA-DNA primer is handed over to Pol δ with the help of RPA and RFC. The latter loads the homotrimer PCNA, also called the replication clamp, onto the primed DNA and supports the Pol δ-primer interaction to elongate the Okazaki fragment in processive manner [
1,
81,
99,
100,
101,
102,
103]. When Pol δ reaches the 5′-end of the previous Okazaki fragment it starts to displace the RNA and parts of the Pol α-synthesised DNA as a first step of Okazaki fragment maturation (
Figure 2, inserted panels, [
1,
35,
102,
103,
104,
105,
106,
107,
108,
109]). The resulting flap is then cleaved by FEN1 (flap endonuclease 1) which associates with PCNA in parallel with Pol δ and LIG1 (Ligase 1, [
102,
103,
104,
105,
108]). The FEN1 cleavage yields two adjacent DNA strands with a nick, the perfect substrate for LIG1, which ligates them together in an ATP-dependent manner yielding a continuous stretch of lagging strand DNA (
Figure 2, [
102,
103,
110]). Alternatively, RPA binds to the Pol δ-displaced flap DNA, which FEN1 then can no longer access, and recruits endonuclease/helicase DNA2 to the sequence. The latter cleaves the flap just one nucleotide adjacent to the dsDNA part of the sequence and LIG1 cannot ligate the DNA2 product. Instead, FEN1 cuts off the remaining single nucleotide and yields a nicked dsDNA sequence as described above, which LIG1 ligates [
102,
103,
110]. Interestingly, the maturation enzyme DNA2 serves as one of the sensors for ongoing unperturbed DNA replication (see
Figure 3 and
Section 5, [
111,
112]. As the removal of the RNA part during the maturation process in the above described is slow, pre-removal of the RNA primer by the RNAse H accelerates the maturation rate by about one order of magnitude [
102].
5. Challenges at Replication Forks—Replication Fork Stalling
RFs are tightly controlled, and stable structures formed during eukaryotic chromosomal DNA replication in S phase [
42,
74,
81,
116,
118,
121,
122,
123,
124,
125,
126]. Nevertheless, various problems and challenges including DNA lesions, difficult-to-replicate DNA sequences, and collisions with transcription machineries may happen and interfere with the progression of RFs, which are summarised as ‘replication stress’ (
Figure 3 and reviewed [
8,
41,
42,
72,
116,
118,
125,
126,
127]. Such stress can yield the stalling of DNA polymerases or slow-down of the RF movement, which in turn threatens the timely proceeding and fidelity of human genome duplication. Stalled RFs can lead to challenges of genome stability such as DNA double strand breaks (DSBs), or incomplete sister chromatid separation during mitosis. If arrested RFs are not appropriately processed and restarted, they become unstable, and the collapse of forks may occur [
8,
41,
42,
72,
116,
118,
125,
126,
127]. Numerous different issues such as shortages of nucleotides or replication factors, the misalignment of nucleotides, lesions of DNA template, or malfunctions during the unwinding process can cause replication stress. Although these challenges normally arise sporadically, oncogenic activation of proteins such as E2F, and mutations of tumour suppressor proteins such as p53 and pRB also frequently cause replication stress [
41,
42,
72,
116,
118,
125,
127].
However, eukaryotic cells have adapted to such challenges and several pathways, the so called the DNA damage response (DDR) pathways, have evolved to stabilize, repair, and restart the stalled forks to prevent additional DNA lesions and faulty structures from progressing further through the cycle [
41,
42,
72,
116,
118,
125,
127,
128]. These DDR pathways, which consist of numerous proteins including protein kinases such as the ATM (Ataxia telangiectasia-mutated) and especially ATR (ATM-Rad 3-related protein), have been well described in recent reviews [
118,
125,
126,
128,
129,
130,
131]. In the present review, we will focus on the roles of DNA synthesis initiation reactions in the establishment of the replication stress signal and their connections to DDR pathways (
Figure 3) as well as how they support the restart of DNA replication.
It is important to note that ATR is not only active during replication stress but also during normal replication processes when ATR monitors RF progression and slows down RFs to avoid e. g., nucleotide shortages (
Figure 3 top panel, [
111,
112,
126,
132]). These ATR-dependent pathways may depend on the loading of the 9-1-1 complex to the 5′-ends of Okazaki fragments, the interaction of TOPBP1 with the 9-1-1 checkpoint clamp (called Rad9-Rad1-Hus1 in
S. pombe and humans, Ddc1-Rad17-Mec3 in
S. cerevisiae), DNA2, and Pol α (
Figure 3, [
111,
112,
119,
120]). A second function of ATR (via the so-called ‘canonical’ ATR pathway [
119,
126]) is to monitor and signal replication stress when RF progress through DNA sequences such as repetitive sequences and sequences containing G-quadruplexes (G4s), or encounter DNA lesions yielding difficulties to progress. Lesions or problematic DNA sequences on the lagging strand are easier to handle due to its discontinuous nature. They can be easily bypassed by initiating new Okazaki fragments to allow later repair of the lesion or its bypass via translesion DNA synthesis of specialised DNA polymerases [
117,
119,
120]. Nevertheless, these non-maturated Okazaki fragments may cause or contribute to replication stress signals (
Figure 3B).
In contrast, on leading strand templates, Pol ε encountering a lesion stops and disengages with the CMG helicase or modifies its activity. As a result, the CMG helicase continues to unwind DNA and load RPA on the leading strand template ssDNA, whereas the lagging strand synthesis continues and Pol α produces RNA-DNA primers. RFC-PCNA-Pol δ can extend these primers to form Okazaki fragment, but if the latter are not maturated they produce ssDNA-dsDNA/RNA products with free 5′-ends, which are recognised by Rad17/24-RFC, the 9-1-1 clamp loader, loading the 9-1-1 clamp on these structures [
114,
116,
129,
133,
134,
135,
136,
137]. Here, it is important to note that RNA primers produced by primase are sufficient for the checkpoint initiation [
114,
138]. The 9-1-1 complex and Pol α on the lagging strand template then recruit TOPBP1 to the stalled RF whereas RPA, an ATR substrate, on the leading binds to the ATR-binding protein ATRIP (
Figure 3B, [
61,
116,
119,
139]). In addition, the stalled Pol ε may interact with TOPBP1 and stabilise its interaction with the stalled RF. In summary, this protein complex with TOPBP1 at its centres forms a protein bridge linking the leading and lagging strand templates. This arrangement of the binding and activation of ATR, the so-called canonical ATR activation pathway, now initiates DDR pathways and helps to stabilise the RF under replication stress conditions as well as by activating checkpoint kinases such as RAD53 in yeast and CHK1 in Metazoan [
111,
116,
118,
119]. The activation of ATR by ETAA1 (Ewing’s Tumor-Associated Antigen 1) with the help of RPA-ATRIP complexes on the leading strand could serve as an additional, TOPBP1-independent branch to activate or amplify ATR signals at stalled RFs [
116,
126]. This model describes the cooperation of multiple key factors to activate DDR pathways resulting in slowing down RFs and preventing origin firing, which are not yet active.
The stalled RF can be release by a partial exchange of RPA to RAD51 with the help of BRCA2 and RF reversal or, as a bypass mechanism. For the latter, RPA recruits prim-pol (primase-polymerase), a second eukaryotic DNA polymerase with associated primase activity different to Pol α, to the leading strand template and induces prim-pol to synthesise a primer to restart leading strand DNA synthesis [
127,
140]. Importantly, Pol α associated with CMG at the RF cannot initiate on the leading strand outside the replication origin but its activity is important for the checkpoint initiation, a clear separation of tasks between Pol α and prim-pol [
71,
72,
127,
140]. However, to restart leading strand synthesis in G4-rich sequence the CTC1-STN1-TEN1 (CST complex), which normally acts at telomere sequences (see
Section 6), can recruit Pol α to G4-rich templates and activate the enzyme complex to synthesise RNA primer [
141,
142]. As an additional alternative to start DNA replication, dormant origins can be activated to restart leading and lagging strand DNA synthesis [
143,
144,
145]. These processes would then in turn solve the replication problem and allow continuation of replication of genomic DNA. However, in case that the replication problem(s) cannot be resolved, the situation may lead to cell death pathways e.g., via ATR-dependent CHK2 activation [
146], or continuous hyperactivation of ATR yielding senescence of these cells [
147].
6. Okazaki Fragment Synthesis and the End-Replication Problem at Telomeres
The end replication problem hypothesis refers to the challenge that eukaryotic cells faced by the lagging strand replication machinery to fully replicate DNA at the end of their linear chromosomes first hypothesised by Olovnikov [
148]. At telomeric ends of chromosomes on the lagging strand, primase synthesises the last oligoribonucleotide at or close to the end of the ssDNA template [
149]. Thus, the removal of the RNA primers and putatively parts of the Pol α-synthesised DNA results in a shortened lagging strand, which cannot be fully synthesised by the DNA replication machinery [
150]. This yields the shortening of one of the newly synthesised chromosomes and the loss of genetic material in the next DNA replication round [
151]. Multiple replication rounds will finally result in the loss of genetic information. Telomeres, nucleoprotein complexes consisting of repeated ‘TTAGGG’ sequences and proteins at the end of chromosomes, are responsible for the protection of eukaryotic genomes [
152,
153,
154]. Telomeres have a special DNA structure including a region of dsDNA ending with a 3′ tail containing ssDNA known as the G overhang. Additionally, telomeres consist of proteins including the shelterin complex, which plays a crucial role in the regulation and protection of telomeres [
151,
155]. At telomere ends, the shelterin complex also controls the activity of telomerase, a reverse transcriptase, which has its own RNA template and synthesis telomeric DNA repeats to the G overhang. On one hand, the RPA-like ssDNA-binding protein complex consisting of CTC1, STN1 and TEN1, the CST complex, diminishes DNA synthesis by telomerase. On the other hand, CST binds to Pol α, recruits the enzyme complex to telomere sequences, and remodels Pol α from an inactive to initiation-active enzyme complex, which, in turn, starts to synthesise the RNA primer close to the 5′ of the template strand starting C strand synthesis [
43,
75,
96,
151]. Next, CST and PRIM2/PRI2-CT hand over the RNA primer to the DNA polymerase domain of PolA1 in the Pol α complex, which adds dNMPs to the strand. PCNA-Pol δ then elongates the RNA-DNA primer until the C strand is fully synthesised [
82,
83,
96]. The importance of the process described above is underline by human disease such as Coats plus syndrome having mutations in CST subunits [
156,
157]. Recent findings suggest that the CST-Pol α complex solves an until recently little recognised second telomere end-replication problem since a lack of CST-Pol α yields not only shortening of the lagging strand at telomeres but also a resection of the leading strand DNA in the next round of DNA replication [
150].
7. Outlook
The faithful replication of cellular DNA including replication initiation reactions are central processes for the maintenance of human cell functions and avoiding genetic diseases including cancer [
158,
159,
160]. Cancer cells proliferate and replicate their DNA in an uncontrolled manner, which leads to abnormal cell numbers. The lack of control of replication initiation mechanisms is often a cause of genome instability, which in turn is a property that enables cancer development [
159,
160]. However, beyond cancer genetic diseases such as Meier-Gorlin syndrome, a form of microcephalic primordial dwarfism, and Coats plus syndrome (also called Cerebro retinal Microangiopathy with Calcifications and Cysts (CRMCC)), a multi-organ symptom including brain, eye, and gastrointestinal tract, have mutations in central players of the initiation of DNA synthesis such as MCM2-7, CDC45 and the CST complex [
28,
83,
156,
161].
Here, the understanding of replication processes such as initiation of DNA replication at origins and Okazaki fragment synthesis may help to develop treatments of these rare diseases. Progress in the knowledge of the structure, protein-protein interactions, and PTMs (post translational modifications) of replication factors involved initiation processes and the replication fork using Cryo-EM and modern mass spectrometry including single-cell analysis approaches will bring about the knowledge to develop treatments [
8,
62,
162,
163,
164,
165]. Next generation sequencing techniques with the adaptation to DNA replication processes as seen for genomic Okazaki fragment distributions will help to map origins and their dynamic during normal and perturbed DNA replication. These methods will not only be used in yeast but also in human cells and other organisms, which in part have started, will bring a better understanding of the regulation of DNA replication during normal undisturbed and perturbed cell cycles as well as dysfunctional replication control in cancer cells [
35,
166]. The latter may help to identify new Achilles heels of cancer cells, which can be used as targets for treatment development, or for cancer cell diagnosis and cancer prognosis. Examples of recent developments are the cancer diagnostics using the overexpression of MCM2-7 proteins shown in a variety of cancers such as renal cell carcinomas, breast, prostate, and lung cancer. Increased levels of MCM2-7 have usually associated with a poor prognosis of these cancers and an aggressive tumour behaviour [
167,
168,
169]. Additionally, the fundamental research of the replication initiation pathways has yielded the development of new inhibitors for cancer therapy using such as CDC7 and the CMG helicase as targets [
170,
171,
172,
173,
174].