1. Introduction: Karyotype Diversity and Species Richness
The origins of the biodiversity comprising the Tree of Life involve longstanding and ongoing debates in evolutionary biology. Darwin characterized the astonishing species diversity among angiosperms (some 4,000 species) as the “abominable mystery”: “the rapid development … of all higher order plants within recent geological times” [
1]. The generation of biodiversity involves two fundamental but biologically independent variables: mutation and selection. Mutation events, which alter genotypes, have been assumed to occur randomly at the molecular and genetic levels, while natural selection is expected to act non-randomly on the correspondingly altered phenotypes at the individual and population levels.
A major question of interest concerns how these two variables interact to establish an equilibrium, or balance, between the forces of mutation and selection during the process of speciation. Most mutations are deleterious to the organism and undergo negative, or purifying, selection while other mutations are beneficial and undergo positive selection, or adaptation. A third class of mutation is neither beneficial nor harmful but instead “neutral” or “nearly neutral”, meaning they are expected to have either negligible or no effect on an organism’s differential fitness. Neutral mutations become fixed, or substituted, in a population through random genetic drift rather than by Darwinian natural selection.
By analogy with symbiosis, deleterious mutations can be seen as acting as parasites that harm the organism; beneficial mutations can be seen as acting as mutualists that provide an advantage to both host and symbiont (the mutation itself); while neutral mutations can be seen as acting as commensals that benefit the symbiont without significantly harming the host. The analogy is not far-fetched: eukaryotes emerged as a result of a symbiotic and mutagenic event that eventually resulted in the invasion of non-coding symbiotic DNA into the host genome. The characteristic of being neutral has, along with other mutational events such as whole genome amplifications (polyploidy), resulted in the astonishing range of genome sizes and architectures across the eukaryote Tree of Life, in striking contrast to the relatively streamlined genome size range found in prokaryotes [
2].
Eukaryote genomes are organized in individual units (chromosomes) of differing numbers and sizes (karyotypes). Genomes themselves vary enormously in size across both plants and animals. Animal genome sizes, for example, range from 0.04 picograms (pg; C-value = IC) in
Trichoplax adhaerens to 133 pg in
Protopterus aethiopicus, or about 3,300 fold. In plants, genome sizes range over 2,400 fold from 0.06 pg to about 152 pg, the largest known eukaryote genome [
3,
4]. In contrast the genomes of bacteria and Archaea range in size from 0.1 to 16 Mbp, or about 160 fold [
5].
Changes in karyotypes and genome sizes are closely associated with corresponding differences in species richness within different taxonomic groups such as mammals and salamanders. Karyotype diversity (KD), moreover, is highly correlated with species richness (number of species in a clade) across closely and distantly related clades. The evenness in karyotype diversity also closely correlates with species evenness in the mammalian phylogenetic tree, evenness reflecting the balance/imbalance in the distribution of either KD or species richness (SR). In each case, the respective size distributions are significantly skewed in parallel, with lower KD aligning closely with lower SR and vice-verse (
Figure 1).
A still unanswered question arises, at least concerning the Mammalian lineage: are the correlations between KD and SR and the correlations of evenness in the SR and KD distributions a trivial consequence of Neo-Darwinian natural selection acting on random genome modifications and rearrangements? Or do the correlations reflect processes of non-adaptive radiation that result from a balance between mutational inputs and neutral substitutions in a population? In other words, is speciation initially an adaptively neutral phenomenon resulting from relaxed selection, which generates the genetic diversity on which Darwinian natural selection ultimately acts? And if so, to what extent can relaxed selection and genetic drift explain or account for the widely varying distributions of SR observed in most, if not all, animal and plant lineages?
The question is pertinent given that most of the eurkayotic genome is comprised of non-coding, apparently neutral DNA that derives principally from a variety of different transposable elements (TE). This “neutral DNA” is largely responsible for the 64,000 fold range in genome sizes found in eurkaryotes [
6], yet the number of genes in any given lineage varies little. The average number of genes in vertebrates, for example, is about 20,000, while the average number in invertebrates is about [
7]. Rats and humans have approximately the same number of genes: 22,000.) [
8]. Although neutral, the location of TE’s in the genome is not random but compartmentalized in the form of late replicating gene-poor heterochromatin.
It would appear then that species in a lineage differ from each other more in the amounts of non-coding DNA than in the amount of DNA on which natural selection is expected to act, namely genes, regulatory elements and the organization of genes in the genome (synteny). Synteny in mammals, for example, has been remarkably conserved for more than 300 My [
9], yet the mammalian genome has a highly variable karyotype (2n = 6-7 in the Indian muntjac to over 100 in some rodents) and a significantly large range in C-value of about 358 fold [
10,
11].
Although genome size often correlates negatively with rates of evolution in plants and animals, genome stability and rates of change in C-value appear to underlie rates of macroevolution [
12]. Salamanders, for example, have large and highly conserved genomes that are very slowly evolving (C-value: > 10 pg); frog’s have smaller genomes that have much faster rates of evolution (C-value: < 10 pg), while mammals have rates of genome evolution 20X faster than anurans and a much more restricted range of C-value compared to the Amphibia. Importantly, synteny is also highly conserved in frogs and salamanders [
13] (
Figure 2).
Remarkably, natural selection has not purged the eurkaryote genome of this ostensibly useless DNA suggesting that it might play a role in adaptation and speciation, for example, in consolidating reproductive isolation [
14]. While non-coding DNA itself might be biologically inert, the heterochromatin that it forms plays a number of vital roles in transcription, DNA repair, DNA replication timing, and differentiation and development. The following will look at the potential biological functions of non-coding DNA and heterochromatin in relationship to those factors contributing to adaptation and speciation.
2. Non-Adaptive Radiation: Ecological Selection vs. Genetic Drift
Motoo Kimura proposed a hypothesis of non-adaptive radiation (NAR) based on genetic drift, or the random fixation of an allele or genotype in a population [
15]. The neo-Darwinian hypothesis, in contrast, holds that natural selection acting on an advantageous variant phenotype is the primary and principle driver behind fixing a genotype variant in a population [
16,
17]. The NAR hypothesis rests on the assumption that substitution rates equal mutation rates (mutation/substitution balance): mutation rates determine rates of substitution and, consequently, rates of speciation.
Although genetic drift might drive a mutation to substitution and fixation in a group with a small effective population size (
Ne), Kimura’s NAR does not assume that Darwinian natural selection plays a minor or insignificant role in establishing reproductive isolation, for example, through the effects of speciation genes, genomic modifications resulting in incompatible karyotypes and other pre-and post zygotic barriers to gene flow [
18]. It remains unclear, however, how these two evolutionary forces, drift and selection, interact during the processes of speciation and adaptive radiation. [
19]
Theories of non-adaptive radiation have been proposed ever since Darwin. Non-adaptive radiation corresponds to lineage diversification in the absence of environmental shifts or evident niche divergence [
20,
21,
22]. In contrast to ecological based theories of non-adaptive radiations, Kimura’s theory focuses on niche neutral genotype radiations at the molecular genetic level. The theory rests on four fundamental stages defining the speciation process:
- 1)
Relaxation of a selective constraint (a weakened negative, or purifying, selection) resulting in a burst in the number of new gene and genotype variants;
- 2)
Differential fixation of variants in a population, or subpopulations, under the force of genetic drift;
- 3)
Rapid habitat-driven diversification into new niches and environments (ecological selection);
- 4)
Competitive exclusion between related groups leading to extensive adaptive evolution and radically different taxa following successful adaptation to new ecological niches.
A substantial amount of evidence has accumulated in support of the NAR hypothesis since it was first formulated in 1991. The role of relaxed selection in influencing evolutionary rates is well established in plants and animals (Stage 1) [
23,
24,
25,
26]. Relaxed purifying selection is associated with changes in genome size (both expansions and reductions) and altered genome architecture and karyotypes [
27,
28]. The role of genetic drift in modulating genome sizes, however, remains unclear (Stage 2), but is expected to contribute significantly in the ancestral population during early stages of adaptive radiation [
29,
30,
31]. The expected increase in mutational loads under relaxed selection in populations with small
Ne enhances the levels of standing genetic variation under conditions of balancing selection (Stage 3) [
32,
33,
34,
35]. Balancing selection acts to maintain diversity in a population over long periods of time [
36,
37]. The corresponding elevated levels of genetic diversity (GD) in turn promote speciation when variants invade new niches and habitats (Stage 4). Population differentiation, for example, is related to speciation rates over evolutionary time [
38].
Implicit in the NAR hypothesis is a time lag between stage 1 (stochastic divergence between isolated populations) and stage 4 (ecological selection and adaptation) [
39]: the four stages take place in succession, or nearly in succession, over millions of years rather than simultaneously or in parallel [
40]. On a microevolutionary scale, diversification without morphological change has been observed in plants, lizards and salamanders: rates of species diversification are not coincident with ecological and phenotypic evolution, while ecological and phenotypic evolution co-occur in time as expected according to ecological speciation [
41]. These findings are more consistent with a primarily niche neutral diversification model than with models of simple density-dependent diversification [
42]. Hence, the speciation process corresponds to a repeated cycle of niche neutral diversification followed by a period of density dependent ecological adaptation.
Other examples of neutral genotype diversification relate to genotype-phenotype maps and the neutral sets or networks they form [
43]. More than one genotype can code for a single phenotype. The size distribution of neutral sets varies substantially, with any given phenotype mapping to multiple genotypes [
44]. Since ecological selection acts on the individual phenotype, neutral sets of genotypes indicate a widely varying amount of degeneracy that is perhaps a signature of genetic drift [
45].
Genotype-phenotype degeneracy can then be seen as analogous to the degeneracy in the genetic code [
46,
47], which provided an initial insight into the neutral theory of evolution. Neutral divergence of the genotype is therefore operating within the selective constraints that fix a phenotype in a population [
48]. Phenotype plasticity and “epigenomic drift”, or the accumulation of stochastic epigenetic modifications, can also generate other forms of neutral and non-neutral genomic and genetic diversity [
49,
50,
51,
52]. Another example of protein evolution via the force of genetic drift concerns rapidly evolving intrinsically disordered proteins, which increase in number with organism complexity [
53].
3. Genome Stability and Rates of Speciation: Karyotype Diversity Versus Gene Diversity in Determining Species Richness
As early as the 1970′s a clear distinction had been established between karyotype diversity and genetic diversity and their respective relationships to species richness [
54,
55]. Taken together, the observations suggested that “evolution at the organismal level is correlated more highly with karyotype evolution than with structural gene evolution.” [
56]. Moreover, it was found that rates of karyotype evolution varied significantly among different taxonomic groups whereas rates of change in structural genes were about the same.
Another study found a negative correlation between levels of gene heterozygosity and rates of chromosomal speciation, suggesting that rates of speciation increase in populations with small
Ne (low heterozygosity) [
57]. The only feasible way, however, of estimating
Ne is to rely on measures of within-population nucleotide diversity at neutral genomic sites, such as silent sites in codons (dS) [
58]. While dependence of heterozygosity on
Ne is necessarily true for isolated populations of the same species (same mutation rate per individual), it is not entirely clear whether the use of such measures can be applied to whole taxonomic groups for comparative studies [
59]. Absolute rates of silent site divergence, for example, are 7X faster in angiosperms compared to gymnosperms [
60], which might (or might not) affect biological conclusions based strictly on
Ne.
Consistent with the earlier studies in the 1970′s, other biological features such as genome stability also seem to be highly associated with evolutionary rates. Rates of genome evolution appear to be closely correlated with levels of species richness. In mammals, a strong correlation between species richness and karyotype diversity was first reported in 1980. The author proposed that: “properties of stable or unstable karyotype may indicate that the cytological factors of importance are all of a submicroscopic nature.” [
61]. Indeed, the phylogenetic trees of mammals, frogs and urodeles show significant differences in species richness among the three different taxa when accounting for the fossil record (
Figure 1).
Among the three taxa, Urodela have the fewest number of species (816 newts and salamanders; time of emergence: 230 Mya [
62]. Anurans are substantially more speciose than salamanders (7,682 frogs and toads; emerging 180 Mya; [
63], and Mammalia have a similar number of species (6,495 of which about one third, or 2,276, belong to Rodentia). Mammals first evolved 225 Mya, but experienced a rapid adaptive radiation 65,8 Mya among placentals, much later than the anuran radiation [
64]. Hence, salamanders are evolving more slowly than frogs, which are evolving more slowly than mammals [
65,
66]
The question emerges from these and other observations: what are the submicroscopic factors that might explain the correlations between SR and KD and the manifest differences in SR and species evenness in the respective phylogenetic trees—assuming that those cellular and presumably nuclear factors and mechanisms are genuinely associated with the correlations and their respective differences? If that assumption holds true, to what extent then would those yet unidentified factors contribute to—or contrast with—the prevailing view that most if not all speciation and adaptive radiations are attributable to ecological speciation alone instead of to NARs resulting from DNA damage, mutation and diversification? [
16,
67]. What these submicroscopic factors might be remains unknown.
4. DNA Damage Detection and Repair Systems (DDR) and Chromatin Structure.
Genomes with larger amounts of functional DNA (number of genes, regulatory sequences, etc.) are expected to have lower mutation rates; yet larger genomes are more prone to DNA damage and mutation. The apparent paradox can be resolved by noting that the eukaryote genome is compartmentalized into two broad and varying forms of chromatin: euchromatin (EC) and heterochromatin (HC) [
68]. Euchromatin is characterized by large DNA loops that are more accessible to regulatory enzymes and are more rich in genes. Heterochromatin, facultative or constitutive [
69], is more compact, has a much lower gene density and is more refractory to enzymes involved in DNA metabolism (replication, transcription and repair).
This spatial compartmentalization also imposes temporal compartmentalization according to a replication timing (RT) program [
70]: EC replicates early during the S-phase of the cell cycle and HC replicates late. Late replicating DNA protects the genome and cell against mitotic catastrophe, or premature entry into the mitotic phase, which would damage unreplicated gene dense EC and cause apoptosis. Late replicating DNA also serves as a substrate for the ATR/ATM checkpoint system that mediates DNA repair by inhibiting the activation of late and/or dormant DNA replication origins until the cell is ready to recover at mid S-phase from DNA damage. The S-phase and G2/M-phase checkpoint proteins Chk1 and Chk2 govern these functions and organize a multi-factorial cell cycle replication timing program.
Importantly, this temporal compartmentalization corresponds to the differential employment of the two main eukaryote DNA repair systems: error free homologous recombination (HR), which operates more efficiently in the open euchromatin that replicates early in S-phase, and error prone non-homologous end-joining (NHEJ), which operates throughout S-phase but dominates in M and G1 phases [
71,
72,
73]. The ratio between HC and NHEJ decreases with genome size across eukaryotes: species with larger genomes rely more heavily on NHEJ than do species with smaller genomes [
74]. Consequently, they tend to have much larger introns and higher intron density [
75,
76].
Thus, individual mutation events, although assumed to be randomly occurring, are not randomly distributed across the genome [
77]. Several studies have established that rates of mutation depend highly on the replication timing (RT) of subregions of the eukaryote genome. The RT program therefore serves to limit mutation rates in gene rich EC: mutation rates are significantly higher in late replicating HC [
78,
79,
80,
81,
82,
83,
84], suggesting that mutation rates in early and late replicating DNA are anti-correlated to a degree directly proportional to the quantity of late replicating HC relative to early replicating EC. DNA damage prone polymerases, the Y-family of translesion polymerases, also might account for the higher mutation rate in late S-phase. The elevated mutation rates in late replicating DNA in yeast, for example, are suppressed when DNA translesion polymerases are rendered inactive [
79].
Not surprisingly, the functional identity of genes—gene ontology—is also unequally distributed across the genome. Essential house keeping genes, required for the survival of all cells, are universally early replicating while adaptive genes, such as the olfactory complex, are generally late replicating and located in or near heterochromatic domains [
85,
86]. Speciation genes tend to be non-essential in contrast to house-keeping genes [
87].
The late replicating status of non-essential mutation prone speciation/adaptive genes remains to be firmly established, but some studies strongly suggest that the epigenome biases mutation [
88,
89], which might promote the adaptive functions exhibited by immune system genes, which are late replicating, and other ecologically responsive, or sensitive, genes. The generation of genetic diversity by promoting DNA damage in late replicating genes can thus be viewed as analogous to the programmed DNA damaging processes involved in the generation of antibody diversity in the immune system [
90].
5. Limb Regeneration, Cell Differentiation, Development and Aging.
Species with large C-values have longer introns and correspondingly slower rates of transcription, a phenomenon known as “intron delay” [
91,
92]. Consequently, they have much slower cell and life cycles. Other features associated with species with either large or more stable genomes are long maximum lifespans (MLS), slow rates of development and in some cases the ability to regenerate ablated tissue [
93]. Salamanders, for example, can regenerate not only limbs but also internal organs including the brain [
94]. Tissue regeneration depends on a strong DNA damage response system that converges on the cell cycle checkpoint regulators Chk1 and Chk2: inhibition Chk1 and Chk2 impairs regeneration [
95].
Given the role of the heterochromatin-DDR complex in RT, DNA repair, cell cycle regulation, limb regeneration and aging, it is not surprising that an embryonic state of chromatin also facilitates experimental cloning of animals [
96]. This might suggest that the limb regeneration and slow aging phenotypes in salamanders are associated with the substantially larger amounts of heterochromatin in their genomes compared to other species with smaller genomes. Obligate neotenes consistently have genomes much larger than metamorphic or direct developing salamanders [
97,
98]. Larger amounts of heterochromatin might therefore facilitate DNA repair, slow the rate of aging, enhance MLS and retard developmental rate.
DNA and histone methylation are features of heterochromatin, and are associated with developmental genes, gene regulatory regions and the polycomb repressive complex 2, a histone methyl transferase associated with repressed transcriptionally silent facultative herterochromatin and X-chromosme inactivation [
14]. Histone methylation also participates indirectly in the DDR [
99]. Additionally, epigenetic drift involves the erosion of CpG methylation, and is closely associated with aging: higher densities of CpG methylation buffer against epigenetic drift and extend MLS [
51]. Other important chromatin modulators such as Sirt6 are also involved in the HC-DDR complex and influence mutation rates and aging [
100]. The link between heterochromatin, genome stability and aging perhaps can be extended to rates of speciation [
101,
102,
103].
When DNA damage occurs, cells face three possible outcomes depending on the amount of damage: 1) checkpoint mediated cell cycle arrest and DNA repair (DDR activation), 2) cellular senescence (aging) and 3) apoptosis (programmed cell death). A fourth fate involves cellular differentiation [
104,
105]. Apoptosis is an integral feature of both the DDR and the cellular differentiation that drives embryogenesis and development [
106]. Chk1 is activated, for example, at the midblastula transition during embryogenesis when the cellular transcription program is switched on [
107]; and it acts to extend the cell cycle and initiate cellularization in the developing embryo [
108].
It has been claimed that cellular differentiation, a feature driving the evolution of metazoans, emerged as a defense against lethal DNA damage and oncogenesis [
105]. The idea is that cellular differentiation is an evolutionary adaptation to DNA damage and a prophylaxis against oncogenesis in metazoans. This raises an interesting question: are rates of evolution constrained by rates of development? Rates of development in salamanders, for example, are constrained by a nucleotypic effect relating to genome size [
109]. If so, could speciation rates scale with the timing of the program of differentiation and development in the individual organism?
Additionally, the lower levels of DNA damage and the stronger DDRs associated with higher levels of heterochromatin might contribute to the slower rates of evolution observed in Urodela compared to Anura and Mammalia. Such a relationship is also apparent within the Urodela lineage: species richness at the family level taxonomic clade is negatively correlated with C-value. Although the latter observation remains to be rigorously established, slow aging and longer developmental programs, which result in longer MLS, provide the organism with more time to repair DNA damage, thus promoting the efficiency of DNA repair and enhancing genome stability by reducing mutation/substitution rates.
If mutation rates set rates of speciation as Kimura’s NAR hypothesis proposes and if mutation rates vary substantially across animal and plant lineages, the DDR and HC must play important roles in determining speciation rates across the Tree of Life (
Figure 3). Speciation rates might indeed be related to developmental rates [
110], a question that has long intrigued evolutionary and developmental biologists. This would suggest that, in more than just a metaphorical sense (though not exactly in a literal sense), “ontogeny recapitulates phylogeny.” [
111,
112]. It would be interesting, nonetheless, to investigate how rates of speciation and phylogenesis scale with rates of development and ontogenesis should it turn out that the DDR and heterochromatin are in fact limiting for cell cycle progression and mutation [
113,
114,
115,
116].
Discussion
This review has attempted to adumbrate some of the various mechanisms in which heterochromatin and DNA repair might play a role in maintaining genome integrity and stability, biological features that are increasingly associated with rates of speciation and adaptive radiations [
117]. The central question addressed here concerns to what extent molecular mechanisms mediating genome dynamics determine rates of evolution in parallel to, or even in concert with, gene specific mutation rates.
Mutation rates in vertebrates, for example, are very similar to rates of TE transposition [
118,
119,
120], which is regulated by heterochromatin and ecological variables that shape phenotype plasticity. A role for TE activity in punctuated equilibria has also been suggested [
121]. Heterochromatin, however, might not be in and of itself a determining factor of SR and KD, but instead might operate more indirectly through the multiple pathways, both molecular and ecological, that affect and influence evolutionary outcomes. It has now become clear, however, that heterochromatin plays vital regulatory roles in RT, DNA repair, transcription and development. Its role in speciation merits further investigation.
NAR, in its molecular formulation, might imply a biphasic mode of evolution: 1) a lag period of drift involving chromosomal and genome rearrangements in a neutral niche occupied by an ancestor population (stem group), followed by 2) niche diversification and neo-Darwinian positive selection on adaptive genes resulting in ecological speciation (crown group) [
122,
123] (
Figure 4). The fact that synteny is highly conserved in salamanders, frogs and mammals while rates of structural change in genes are fairly constant supports the proposal that karyotypes evolve neutrally whereas the transcriptome and corresponding phenotypes evolve according to positive (and purifying) selection. It is also notable that synteny is correlated with MLS in mammals (unpublished).
Both features, karyotype diversification and genetic diversification, might contribute successively or in tandem (and in concert) to the processes of reproductive isolation and adaptation. Might there then be two distinct molecular clocks determining the mode and tempo of evolution: a gene based molecular clock that sets a constant rate of genetic evolution across lineages, and a genome/junk based molecular clock that sets a given rate of speciation that varies from lineage to lineage? [
124,
125].
The central tenet of Kimura’s NAR hypothesis relies on the assumption that mutation rates directly influence substitution rates (mutation-substitution balance) and therefore speciation rates. Ecological speciation, in contrast, rests on the assumption that environmental shifts acting on functional DNA alone (or predominantly) determine speciation rates. It has been repeatedly found in every organism examined so far (including salamanders) that substitutions at non-silent sites in gene codons (amino acid substitutions) are correlated with substitutions at silent sites, suggesting that selection acts not only on genes but also on gene locations and regions in the genome (eg. early vs. late replicating DNA, heterochromatin vs. euchromatin) [
88].
This raises an interesting, perhaps provocative question: To what extent do mutation rates and DNA repair efficiencies influence, or set, substitution rates—and hence speciation rates—independently of ecological selection? It has been pointed out that “locational selection would have to be realized through the influence of the local mutation rate on the amino acid changing mutation rate” [
88]. If this hypothesis is correct—selection based on gene location—and if it is a reflection of the non-random distribution of DNA damage events, it would not be unreasonable to expect that such a relationship/correlation between gene location, DNA damage and DNA repair efficiency would apply not only within genomes, but also across taxa (salamanders vs. frogs vs. mammals) in a manner that sets variations in speciation rates within lineages and explains, at least in part, the striking differences in species richness and evenness observed in the Tree of Life.