1. Introduction
Structural Maintenance of Chromosomes (SMC) complexes are an evolutionary conserved protein family present from bacteria to humans [
1]. In most eukaryotes, three SMC complexes have been characterized: cohesin, condensin, and SMC5-6 complexes [
1]. Such complexes are involved in a plethora of functions, including mitotic and meiotic chromosome condensation, sister chromatid cohesion, accurate chromosome segregation, DNA replication and repair, genome compartmentalisation, and transcriptional regulation. All SMC complexes share structural features. Each complex is composed of three core proteins (two SMC proteins and a kleisin subunit) and peripheral subunits forming a ring-shaped structure [
1,
2].
The cohesin complex is most likely the best studied SMC complex. In mammalian cells, the cohesin complex comprises two SMC proteins (SMC3 and SMC1A or SMC1B), an alpha-kleisin subunit (RAD21, RAD21L, or REC8), and a stromal antigen protein (STAG1, 2, or 3) [
2]. Four of these subunits (REC8, RAD21L1, SMC1B, and STAG3) have an almost exclusive meiotic expression and are therefore referred to as meiotic-specific cohesins. Hereafter, for the sake of simplicity, SMC3 (present in all cohesin complexes) and the remaining cohesin subunits (expressed preferentially in somatic cells) will be designated non-meiotic cohesins. Cohesin complexes are involved in a number of different mechanisms: from keeping sister chromatids together to contributing to the compartmentalization of chromosomes in topologically associative domains (TADs). Chromosome and nuclear compartmentalization, as well as TAD assembly, are mediated by phase separation. It has recently been reported that a fraction of cohesin associates with chromatin in a manner consistent with bridging-induced phase separation (BIPS, also known as polymer–polymer phase separation) [
3,
4]. BIPS uses multivalent protein‒DNA interactions bridging two distinct DNA regions and forming a DNA loop that acts as a nucleation structure for phase condensation [
3,
5]. In addition, during meiosis, meiotic-specific cohesins mediate Sister Chromatid Cohesion (SCC), Synaptonemal Complex (SC) assembly and synapsis, as well as telomere attachment to the nuclear envelope and telomere maintenance. The essential role of the cohesin complex in many aspects of chromosome biology is supported by the fact that defects in cohesin genes can lead to different diseases in which chromatid cohesion, DNA repair, transcriptional regulation, and genome topology are altered. Mutations in meiotic-specific cohesin genes have been associated with infertility, age-related aneuploidy, and premature ovarian failure [
6]. Moreover, mutations in non-meiotic cohesin complex components and in their regulators have been associated with cancer [
7,
8,
9]. Globally, mutations in these genes lead to disease conditions also known as cohesinophaties. Among these, Cornelia de Lange syndrome (CdLS) is the most frequent and best known entity [
10,
11]. CdLS is a malformative syndrome affecting many organs, in which intellectual and growth retardations are the main phenotypic manifestations [
12,
13]. Patients require life-long rehabilitation and about 80% of cases carry mutations in one of cohesin complex components or in one of their regulators (
SMC1A,
SMC3,
RAD21, STAG1, STAG2,
NIPBL,
HDAC8) [
11,
12,
14].
In addition to cohesin complex, most eukaryotic genomes contain two distinct condensin complexes (Condensin I and II) that differ in their non-SMC subunits, in cellular localization, and in their regulation during cell cycle [
15,
16,
17]. In particular Condensin I localizes in the cytoplasm and gains access to chromosomes between prometaphase and telophase, when the nuclear envelope breaks down (NEBD). Conversely, Condensin II has a nuclear localization and, in mitosis, it binds stably to chromatin. Like cohesins, the condensin complex plays a key role in chromosome condensation, assembly, and segregation during mitosis and meiosis [
18,
19,
20]. Condensins have also been associated to pathological conditions, as mutations in condensin subunits result in microcephaly due to impaired DNA decatenation [
21,
22].
The third member of SMC family, the SMC5/6 complex, has important functions in DNA repair by recombination, but also plays a role in influencing genome stability and dynamics in undamaged cells [
23,
24]. Furthermore, by preventing accumulation of toxic recombination intermediates, SMC5/6 promotes correct mitotic and meiotic chromosome segregation [
23,
24]. As in the case of cohesins, protein levels of SMC5/6 components decrease with age in mouse oocytes [
25]. It was thus speculated that, in humans, reduced SMC5/6 availability may be associated with the increased risk of chromosomal abnormalities and infertility linked to maternal age. Moreover, mutations in
NSMCE2 or
NSMCE3 have been described in patients with primordial dwarfism, extreme insulin resistance, gonadal failure [
26], and lung disease immunodeficiency and chromosome breakage syndrome (LICS) [
27]. Finally, the complex acts as a host-restrictor factor, inhibiting the transcription of genomes of different viruses (i.e.: HBV, unintegrated HIV1, HSV1, HCMV, KSHV, and HPV) [
28,
29,
30,
31,
32,
33,
34,
35,
36].
Due to their essential functions and association to pathological conditions, SMC complex proteins would be expected to evolve under strict evolutionary constraint. Nevertheless, King and colleagues [
37] recently observed signatures of recurrent positive selection in the Condensin II and in mitotic cohesin complexes across
Drosophila and mammals. They also suggested the presence of an evolutionary arms race driven by viral infections.
To better understand the selective events underlying the evolution of genes that encode SMC complex proteins, we analyzed the selective patterns of all the proteins that contribute to the formation of Cohesin, Condensin, and SMC5/6 complexes.
4. Discussion
Large-scale three-dimensional rearrangements of chromosomal DNA drive and facilitate diverse genomic processes, from chromosome segregation to gene expression, DNA repair, and recombination. SMC complexes are involved in these fundamental processes of genome organization, they are essential for all organisms across the tree of life, and they are deeply conserved in eukaryotes [
1]. The importance of these complexes is not limited to mitosis and meiosis, where in fact they are fundamental, but they participate with different functions throughout the all cell cycle [
16]. The pivotal role played by the SMC components is confirmed by two other pieces of evidence: i) mutations in SMC genes determine pathological conditions, including tumor forms; ii) some of these genes are targets of natural selection as previously reported in
Drosophila and in some mammalian groups [
37,
62]. In these studies, evolutionary analyses have only been conducted on a limited number of SMC genes. Thus, we aimed to cover this gap by analyzing the evolutionary history of all the components of the SMC complexes, including meiotic cohesins, which were never analyzed previously. Indeed, given the key role of these genes in the regulation of primary biological processes of the cell machinery, many different selective forces are expected to drive their evolution.
Our observations on the genes of the cohesin complexes are particularly interesting. In these genes, two distinct trends are highlighted. On one hand, the mitotic cohesins are highly constrained; on the other, the meiotic cohesins show signals of pervasive positive selection. Indeed, in all cohesin genes with predominantly meiotic expression we identified strong positive selection signals and the selected sites are significantly clustered within IDRs, supporting growing evidence that IDRs are fast evolving in different systems [
67,
68,
69,
70,
70,
71,
72,
73]. Protein containing IDRs are known to be essential for phase separation (PS), a process that consists in the compartmentalization of proteins and nucleic acids within the cell and plays a role in a wide range of processes, including meiotic chromosome organization, chromosome dynamics and meiotic sex chromosome inactivation (MSCI) [
64,
65,
66,
74]. A series of meiosis-specific events, including programmed DNA double-strand break formation, homologous pairing, synaptonemal complex installation, and inter-homolog crossover formation, take place to ensure successful chromosome segregation. During meiosis, cohesins and chromosomal phase separation are fundamental in these processes. In this light, we suggest that meiotic cohesins may be engaged in an intra-genomic conflict similar to the ones previously described for centromeres, telomeres, and telomere/centromere-binding proteins [
75,
76,
77]. The centromere drive hypothesis posits that selfish centromeric DNA elements promote their preferential inclusion in the oocyte through the recruitment of kinetochore components. Similarly, we previously proposed that selfish subtelomeric DNA elements can influence the directionality of chromosome movements to the centrosome during meiosis, and that this skews their segregation; the fast evolution of telomere-binding proteins would thus serve the purpose of suppressing meiotic drive and restore equal partitioning [
75]. Because cohesins can potentially influence chromosome movement during meiosis, they may also participate in the control of cheating DNA elements to ensure proper segregation. In support of this hypothesis, we detected a significant correlation between the evolutionary rate of meiotic cohesin genes and their upregulation during mouse female meiosis. We thus suggest that cohesins join centromere- and telomere-binding proteins as elements involved in intra-genomic conflicts fueled by selfish elements that promote meiotic drive. Also, MSCI is considered a driving force for genomic evolution. In particular, germline X chromosome inactivation, which occurs in the in the germ cells of XY males, has been linked to genetic conflict related to sexual antagonism [
78]. Thus, an alternative, non mutually exclusive possibility is that meiotic cohesins are involved in an intra-genetic conflict related to MSCI.
The SMC5/6 complex, in addition to its physiological roles in chromosome maintenance (repair of chromosomal DNA, conformational compaction of bound DNA, DNA replication), functions as a host restriction factor against several viruses, including HBV, unintegrated HIV-1, papillomavirus (HPV), and different herpesviruses (KSHV, EBV, HSV-1) [
30]. The SMC5/6 complex recognizes and binds viral episomal DNA molecules inducing their epigenetic silencing. In turn, episomal DNA viruses antagonize the function of the SMC5/6 complex by expressing viral proteins that degrade one or more SMC5/6 components. For example, the HBV HBx protein recruits cellular DNA damage-binding protein 1 (DDB1), which contains an E3 ubiquitin ligase that targets SMC5/6 for proteasomal degradation. This antagonism of the SMC5/6 complex by HBx is an evolutionarily conserved function found in divergent mammalian HBV species [
62] and leads to the specific degradation of SMC5 and SMC6 components [
28,
29]. A similar function is reported for EBV BNRF1 and KSHV RTA [
34,
36].
In general, these observations suggest that components of the SMC5/6 complex are engaged in a host-pathogen genetic conflict. The latter ensues when a host restriction factor targets one or more viruses, which evolve counter-restriction mechanisms. The viral proteins mutate to escape restriction by the host factor, which in turn evolves to re-estabilish viral restriction. This cycle recurs repeatedly and results in an evolutionary arms race [
79].
The arms race with viral pathogens may underlie the positive selection signal identified in the two components of the SMC5/6 complex, as both are directly involved in pathogen-host conflict: SMC5 is a HBV Hbx target for proteasomal degradation, while NSMCE4A interacts with episomal DNA template.
Mammals have two Nse4 paralogs, Nse4a and Nse4b (encoded by NSMCE4A and EID3, respectively), which share two highly conserved kleisin domains. The two proteins are equally efficient at supporting the assembly of a full SMC5/6 complex, nevertheless it has been suggested that smc5/6 containing NSE4a or NSE4b may exhibit different DNA binding substrate preferences [
80]. Indeed, the Nse4a-containing SMC5/6 complex exhibits episomal restriction activity and has been recovered in HBx pull-down experiments. In contrast, the Nse4b-containing SMC5/6 complex is defective in its interaction with episomal DNA template, supporting our hypothesis that the positive selection signals identified in
NSMCE4A gene (but not in
EID3 gene) arise from a host-pathogen conflict.
An evolutionary conflict between hosts and pathogens could also underlie the positive selection found in NCAPG. By acting on the condensin complex, gammaherpesviruses are able to induce host chromosomal condensation to promote the replication of the viral genome. EBV is known to activate the condensin complex by NCAPG phosphorilation [
81]. Specifically the viral BGLF4 kinase induces NCAPG phosphorylation at the Cdc2 target motifs, suggesting that the viral kinase might induce chromosome condensation by mimicking Cdc2. The Condensin I complex is constitutively present throughout the cell cycle and regulates the state of chromatin condensation, which is in a relaxed form during interphase and is converted into compact rod-like structures (chromosomes) over a short period of time during mitosis. The function of Condensin I must be tightly regulated during the cell cycle and this occurs through the phosphorylation of its components by different kinases. Three of the four positively selected sites in NCAPG fall into phosphorylation sites and in particular site 37 corresponds to the residue that is phosphorylated by Casein Kinase 2 (CK2). CK2 is the main kinase that phosphorylates Condensin I during interphase and reduces its supercoiling activity, in contrast to the slight stimulatory effect of mitosis-specific phosphorylation by Cdc2 [
82]. We speculate that other NCAPG phosphorylation sites other than Cdc2 sites may be the targets of viral kinases determining the effects of natural selection on this gene and in particular on CK2 phoshorilation sites.
In conclusion, we suggest that the natural selection signals identified in SMC complexes may be the result of different selective pressures. Regarding the selection signals in the condensin and SMC5/6 complexes, the data suggest a host-pathogen arms race. In contrast, the evolutionary rate of meiotic cohesion genes could be the result of an intragenomic conflict similar to that described for centromeres and telomeres.