1. Introduction
RAMOSA1 (RA1) is a C2H2 zinc finger protein first cloned and studied in
Zea mays (maize), a major crop species [
1,
2]. The RA1 is a small protein (175 aa.) that is localized in the nucleus despite lacking a traditional nuclear localization signal [
3]. The coding sequence of RA1 is characterized by having a unique zinc finger domain and two well characterized ethylene-responsive element binding factor-associated amphiphilic repression (EAR) motifs (xLxLxLx) towards the C-terminal. In
ra1 mutants, long indeterminate branches replace short determinate branches in both the male and female maize inflorescences suggesting that RA1 controls meristem fate and identity. Protein-protein interaction in yeast showed that RA1 interacts with RAMOSA1 ENHANCER LOCUS2 (REL2), a co-repressor homolog of
Arabidopsis thaliana (Arabidopsis) TOPLESS (TPL), to repress the expression of target genes [
4]. Indeed, it is well documented that RA1 and REL2 interact in vitro and in vivo via the two EAR motifs of RA1 with the Lyssencephaly type1-like homology, LISH (CTHL), region of REL2 [
4]. Although RA1 was originally suggested as a strong repressor of transcription, chromatin immunoprecipitation (CHIP-seq) experiments in maize showed that it mainly behaves as a promoter of gene expression that controls several regulatory and developmental pathways [
5]. In addition, recent studies suggest that
cis-acting regulatory elements upstream of
RA1 are a key factor in modulating branch meristem determinacy affecting ear and tassel morphology in maize and grass inflorescence architecture [
6]. Indeed, previous studies have found evidence that
RA1 was a selected locus during maize domestication [
7].
So far, the origin and evolution of RA1 in plants is uncertain. In terms of sequence similarity, RA1 is similar to the Arabidopsis SUPERMAN (SUP) protein [
2,
8]. Within the functional context, SUP is involved during floral development preventing the initiation of supernumerary stamens, while RA1 has a central role in inflorescence development and does not seem to intervene in floral development [
2,
8]. The overexpression of RA1 (35S::RA1) in Arabidopsis
sup5 mutants fails to restore the number of stamens in the flower [
9]. Likewise, the 35S::RA1 transgenic plants of Arabidopsis generate pleiotropic effects in the plant, such as an increase in the size of the reproductive organs due to cell expansion [
10]. These results indicate that the role of RA1 differs from SUP. In particular, within grasses, RA1 was cloned in maize and its closest relatives, but it is absent in the genomes of other grass species of the BOP clade (such as rice, wheat, and oat) [
1,
11].
Given the (1) importance of RA1 during maize inflorescence development and domestication, (2) its central role as a regulator of various plant development pathways, and (3) fragmented knowledge that we have about the evolution and role of the RA1 in other cereals, we decided to deepen the studies in RA1 to increase our understanding on the protein evolutionary history. For that, we reconstructed the phylogeny of RA1 in embryophyte to identify homologs and paralogs protein sequences. Additionally, phylogeny reconstruction results were analyzed considering genome and chromosome collinearity. Homologs and paralogs of RA1 sequences were comparatively characterized with a focus on: conserved/divergent motifs along coding and promoter regions, subcellular protein localization, protein secondary structure and conformational plasticity. We found a complex pattern of gene duplication followed by different patterns of gene copy retention and losses. Retained copies showed partial conservation of the coding sequences and secondary structure, as well as cis-elements in the promoter regions suggesting diversification of protein functionality and upstream regulation. The resulting information will be useful to further explore the mode of action of these proteins in the future.
4. Discussion
RAMOSA1 is a small C2H2 zinc finger transcription factor that has an unquestionable role during maize inflorescence development and domestication [
1,
4,
5]. Indeed, RA1 is the core of the RAMOSA circuit that regulates the fate of axillary meristem determining the final form of the maize inflorescence architecture. Despite its remarkable role, so far, the origin and the evolution of RA1 remained uncertain. It has been mentioned that SUP may be the Arabidopsis homolog of RA1 based on sequence similarity; however, it has been documented that SUP and RA1 have different functions [
1,
8,
47]. To link the fragmented, and sometimes conflicting, knowledge on the evolution and functionality of SUP/RA1 proteins, in this work we present: (1) a solid phylogenetic framework to understand the origin and evolution of RA1 in embryophytes, (2) a detailed SUP/RA1 homology clustering among model and non-model embryophyte species, and (3) new insights on the RA1 gene features and protein structure evolution.
We identified that SUP was present, at least, from the early diversification of embryophytes (land plants). Based on such results we reconstructed the evolutionary history of SUP/RA1 in embryophytes. The phylogeny presented here suggests that RA1 arose from two successive duplications of SUP, one around the base of Poaceae and the other at the diversification of the BOP and PACMAD clades. The first duplication around the origin of the grasses gave rise to the RA1-like A lineage that is sister to the paralogs RA1-like B and RA1 lineages which arose later during the split of the BOP and PACMAD clades. The phylogeny is also supported here by genome synteny studies. Indeed, previous literature and the collinearity analysis presented in this work suggest that the first duplication correlates with the rho WGD of grasses (at ~100 million years ago) and the second duplication may correlate with GD that occurred at the base of BOP/PACMAD separation (at ~80 million years ago) [
48,
49,
50]. Interestingly, RA1 and RA1-like paralogs genes are exclusive of grasses and show different patterns of retentions and losses as is usually observed after genome duplication events [
48,
49,
50]. It is well documented that differential gene loss or subfunctionalization and neofunctionalization of retained copies, as the case presented here, may promote morphological, physiological, and ecological diversification, in particular, in grasses [
51,
52].
The nuclear localization of RA1 is well described [
3]; however, the localization of RA1-like proteins is unknown. To generate knowledge on the general role of such proteins we performed protein subcellular localization studies. We observed that RA1 and RA1-like localized in the nucleus besides lacking a classical nuclear localization signal [
3]. So far, our results indicate that RA1 and RA1-like proteins may move to the nucleus to regulate gene expression like most of the C2H2 zinc finger proteins.
Duplication of SUP/RA1 correlates with changes in the coding region, secondary structure, and diversification of the binding properties of their promoter regions.
Evolution of RA1 and RA1-like amino acid sequences. We found conserved and divergent motifs along the coding region of RA1 and RA1-like studied sequences. All of the sequences studied in this work presented the C2H2-type zinc finger with the motif QALGGH, except for the variant of QGLGGH in the RA1 sequences. Overall, we observed conserved motifs between RA1-like A and RA1-like B and between RA1-like B and RA1 sequences as well as specific motifs that characterize each lineage. Some of these conserved motifs were identified in regions predicted as disordered structures with binding protein affinities, suggesting that RA1 and RA1-like have diversified their function through variations in protein-protein interaction.
Interestingly, we identified motifs linked to positive and negative regulators of the transcription as was previously identified in
SUP [
53]. Negative elements at the N-terminal of the zinc finger domain (Motif 1) might affect nucleosome positioning in the core promoter region, as was shown in
SUP [
53]. Also, the region around the C-terminal (Motif 2) could be a target site for methylation and silencing the gene expression [
53,
54]. Likewise, results of ChIP-seq and genome-wide analysis of RA1 occupancy showed that RA1 can repress and also activate genes associated with nucleic acid-related processes, such as chromatin, and TFs implicated in cell specification with final effects on inflorescence development [
5].
In particular, in the present study, Motif 2 has been identified as a putative EAR-like repressor motif (xLxLxLx; [
39,
41,
42,
43,
55,
56,
57]). Indeed, SUP has been characterized as a repressor protein whose repression activity falls on the unique C-terminal EAR motif [
40]; however, RA1 was described as a repressor protein bearing two EAR motifs: an EAR motif towards the C-terminal of the protein as in SUP [
2] and a second one close to the C-terminal of the zinc finger domain [
4]. It was documented that both EAR repressor motifs of RA1 are involved in the interaction with REL2 (a co-repressor homolog to TPL of Arabidopsis) which, in turn, regulates their target genes promoting the formation of short braches bearing paired spikelets in maize [
4]. Similarly, more recently, it has been documented that SUP acts as a repressor of B class genes in the 4
th whorl during flower development by interacting with TPL via its single EAR motif located at the C-terminal region of the protein [
57]. Indeed, a site-directed mutagenesis of the SUP EAR motif demonstrated that the removal of a L will abolish the interaction between SUP and TPL [
57]. Interestingly, among RA1 sequences, we found natural L mutations in Motif 2 located next to the zinc finger domain in sequences of PACMAD species, but the Andropogoneae. In addition, results presented in this work indicate that most RA1 and RA1-like proteins also carry an additional putative EAR motif in between. So far, experiments on the role of the third putative EAR motif observed in RA1 and RA1-like were not carried out yet. Then, RA1 sequences of the Andropogoneae members may have up to three putative EAR motifs. The phylogeny presented here supports the hypothesis of an increment in the number of putative EAR motifs during the evolution of SUP/RA1. Interestingly, it is well documented that a greater number of L along the EAR, a greater number of EAR motifs along the amino acid sequences, and regions that border EAR motifs are equally important for stabilizing binding to repressor proteins such as TPL [
43]. These results suggest that, given the differences in the number and conservation of L residues of the EAR motifs, RA1 may have evolved towards an increased strength of repression activity through more stable binding to its co-repressor counterpart.
Changes in secondary structures. Studies on how folding information is distributed along a protein sequence provide information on the formation of different secondary structures, such as α-helices and β-sheets, and on binding properties [
58]. Secondary structure predictions carried on in this work suggest that SUP, RA1, and RA1-like mainly differ towards the C-terminal. Such differences are correlated with the presence of one to three Motif 2, identified here as putative EAR motifs. Interestingly, EAR motifs were, previously, described as important motifs for the interaction with co-repressors [
2,
43,
58].
Most zinc finger proteins of plants bind DNA via a short α-helix containing the highly conserved QALGGH sequence [
60,
61]. Interestingly, RA1 orthologs are characterized by having the QGLGGH sequence instead [
2]. It has been suggested that G may act as a helix-relaxing residue that confers unique functional attributes [
2]. Our analysis identified that the SUP zinc finger domain secondary structure consists of a very well-defined ββα motif, as it was experimentally determined [
45]. The same structure was predicted for the RA1 zinc finger domain, suggesting that RA1 binds DNA in a similar way to all other C2H2 zinc finger proteins structurally characterized [
45,
61,
62,
63,
64,
65,
66,
67,
68,
69]. The RMSF studies presented in this work showed that the change of A to G in the QALGGH motif of the RA1 zinc finger does not affect the mobility of the helix region as it was suggested in previous work [
2].
On the other hand, the zinc finger is one of the major structural motifs involved in eukaryotic protein-DNA interaction [
59,
70]. Structural studies on the C2H2 zinc finger have revealed that three positions (determinant residues) in the helical region of the zinc finger participate in major interactions with sequences in target DNA. In animals, the three amino acid residues of the finger that specifically make contacts with the DNA bases occupy the helix positions +2 (Arg), +3 (Asn), and +6 (Arg) and, in plants, two of them correspond to the sequence QALGGH [
55,
61,
71,
72,
73,
74,
75]. In this context, in SUP the QALGGH sequence occupies positions 2–7 of the helix, that is, it includes all three residues (+2, +3, and +6) [
45]. In SUP, position 2 is occupied by a Gln (Q) residue, position 3 by an A, and position 6 by a G residue [
45]. Even though Q and more infrequently A residues are reported to act as base determinants, it is difficult to conceive that G can bind a base of the nucleotide due to the lack of a side chain [
45]. Indeed, it has been proposed that the binding of SUP is performed through residues at relative positions -1 (S), 2 (Q), and 3 (A) [
45]. The G residues at relative positions 5 and 6 could allow the helix to draw closer to the DNA and the residues at the C-terminal of the zinc finger helix, relative positions 9 (N), 10 (V), and 12 (R), may bind the DNA, as in the case of EPF2-5 and EPF2-7 of Petunia [
76]. For the case of RA1 residue at relative position 3 is a G therefore the proposal of positions -1, 2, and 3 as base determinants can be ruled out. Nevertheless, it is still possible that G plays a passive role in the approximation of the α-helix to the major groove of DNA allowing that other determinant residues upstream in the structure can bind the DNA bases [
45]. In the same way, taking the alternative proposal of Isernia et al. [
45], the determinant residues would be 9 (N), 10 (I), and 12 (R), near the zinc finger α-helix C-terminal. The variability of residues in this zinc finger sequence among zinc finger proteins could be related to specificities for different and unique base triplet sequences, generating a huge diversity of target sequences [
61,
75]. To explore these possibilities additional experiments are needed that, however, are beyond the scope of this work.
Changes in promoter binding properties. Regulatory motifs in promoter regions serve as recognition sites for TFs that promote the initiation of transcription as well as specific regulation of gene expression [
36]. In general, most of the
cis-acting elements identified in all promoters suggest that these proteins are related to plant growth and development, but via different pathways or interacting with diverse partners.
The analysis of regulatory elements in the potential promoters of RA1 and RA1-like genes identified binding motifs for TFs involved in seed germination (ABI3, [
77]; ABI5, [
78]; FUS3, [
79]), plant development (BPC5, BPC6, [
80]; TB1, [
81]; ARF4, [
82]; ARF16, [
83]; RA1, [
2]; SMZ, TGA9, [
84]; bHLH130, [
85]; SPL14, [
86]; ATHB12, [
87]; AGL27, [
88]; AGL42, [
89]; CDF5, [
90]; ARALYDRAFT_897773, also known as TCP4, [
91]; ARALYDRAFT_496250, also known as TCP5, [
92]; GRF9, [
93]; ANL2, [
94]; PLT1, [
95]; HHO3, [
96]; ABF2, [
97]; DOF3.6, [
98]), response to abiotic stress (DREB1E, [
99]; ERF008, [
100]; ERF055, [
101]; ERF115, [
102]; NAC020, [
103]; NAC045, [
104]; NAC092, [
105]; OsRR22, [
106]), and response to biotic stress (WRKY62, [
107]; WRKY75, [
108]; MYB73, [
109]).
The conserved promoter sites found in the RA1 sequences in this study are consistent with the findings of Strable et al. [
6]. The authors had identified several highly conserved motifs in the RA1 promoter sequences of Andropogoneae species, such as ARFs-motifs, which were also identified in the present study. Also, our findings support the hypothesis that RA1 plays a role in the transition to flowering, the integration of both developmental and environmental cues [
5], as well as the positive or negative regulation of gene expression in specific organs and at certain stages of development [
53]. Additionally, our results suggest that these sites may also be involved in the regulation of hormonal pathways. In line with this, previous RNA-seq and Chip-seq studies also indicate that RA1 may be involved in the biosynthesis and signaling of gibberellic acid and linked to auxin pathways [
5].
In summary, the phylogenetic reconstruction presented in this work suggests that RA1 arose from two successive SUP duplications during the origin of the grass family and the diversification of grass species. This gave rise to three different grass sequence lineages, namely RA1-like A, RA1-like B, and RA1, most of which have unknown functions. These results indicate that SUP and RA1 are paralog sequences. The phylogenetic distance and duplication events that separated SUP and RA1 may explain the different roles reported for these proteins. It is interesting to note that, most of the studied species retained RA1 and RA1-like proteins in their genomes, indicating their functional importance.
In this report, we have discovered that RA1 and RA1-like have diversified their coding region, which may have led to variations in their protein structure. This suggests that there may be differences in their DNA binding patterns and protein-protein interactions among copies. Additionally, each of the conserved copies have diversified regulatory elements in their promoter regions, indicating differences in their upstream regulation. Overall, we have found that RA1 and RA1-like are involved in different pathways of plant growth and development. Therefore, we propose that gene duplication has enabled subfunctionalization and neofunctionalization in the RA1 and RA1-like gene families in grasses.