2.1. Identification and Characterization of RNAi-Related Genes in Rosaceae
The identified PpAGO, PpDCL, and PpRDR genes were named according to their phylogenetic relationships with the
Arabidopsis thaliana sRNAs biogenesis genes (
Figure 1). A total of 11 PpAGOs were detected in the peach genome based on the structural integrity of their conserved domains. In addition, phylogenetic analysis was performed to evaluate the relationships between peach and
Arabidopsis AGO proteins (
Figure 1A). All the peach AGO proteins clearly form three clades: Clade I (AGO1/5/10), Clade II (AGO2/3/7), and Clade III (AGO4/6/8/9). Notably, despite having no orthologous members in the peach genome, three
Arabidopsis proteins, including AGOs 8, 9, and 3 were clustered with other peach AGOs. They also shared similar physicochemical properties, such as amino acid length, with other peach AGOs in the same clade. Furthermore, the isoelectric point (
pI), molecular weight (MW), and sequence length were analyzed for each identified gene (
Table 1).
The phylogenetic analysis was performed for peach DCLs (
Figure 1B). PpDCLs were divided into four clades (I, II, III, and IV), and each DCL was clustered and named according to the clade.
Based on the phylogenetic relationship between the peach and
A. thaliana proteins, RDR proteins were divided into four clades (I, II, III, and IV) (
Figure 1C). Clade I (RDR1) had the largest members of
PpRDRs: PpRDR1a, b, c, d, e, and
f. Each of Clade II (RDR2) and Clade III (RDR6) had only one
RDR gene member. In Clade IV,
AtRDR3 and
AtRDR4 were grouped with
AtRDR5, which blasted into one
PpRDR5 (Prupe.7G221200).
To perform genome-wide identification of the AGO, DCL, and RDR gene families in the Genome Database for Rosaceae (GDR), all Hidden Markov Model (HMM) profiles of the conserved domains were gathered, and the identities of AGO, DCL, and RDR conserved domains were examined. A total of 97 AGOs, including 11 genes from peach, were identified in the genomes of the eight tested Rosaceae species (
Table S1). Specifically, 10, 18, 10, 16, 14, 10, and 8 members were observed in strawberry (
Fragaria vesca), China rose (
Rosa chinensis), black raspberry (
Rubus occidentalis), apple (
Malus × domestica), pear (
Pyrus communis), almond (
Prunus dulcis), and Armenian plum (
Prunus armeniaca), respectively, as shown in
Figure 2A. To determine the evolutionary relationship among the orthologues of Rosaceae AGO proteins, a comprehensive neighbor-joining (NJ) phylogenetic tree was constructed. Notably, Clade III was the largest in the Rosaceae family with 38 members, while Clade II was the smallest with 25 members. The AGO4 members were split into two subclades, AGO4a and AGO4b, suggesting their expansion among the tested Rosaceae species. Additionally, the AGO8/9 and AGO3 subclades appeared to have been lost during the evolutionary process in the tested Rosaceae species.
In our previous study, a total of eight PpDCL genes were identified in the peach genome [
33] (
Table S2). For RDRs, the total of nine PpRDR members were equally grouped into four major clusters (
Figure 2B). Furthermore, a total of 112 RDR transcripts were identified from the eight selected Rosaceae genomes (
Table S3). Excluding the nine PpRDR copies, a total of 77 non-redundant RDRs were retrieved for further analysis, including 4, 13, 12, 9, 6, 10, and 14 copies from
F. vesca,
R. chinensis,
P. armeniaca,
P. dulcis,
R. occidentalis,
P. communis, and
M. x domestica, respectively. All the obtained Rosaceae RDR genes were displayed in four clades, each clade was outgrouped with
AtRDR. Notably, the PpAGO5 genes were clustered with
AtAGO5, which is included with other
AtRDR3 and
AtRDR4 subfamilies, according to higher sequence similarity with
AtAGO5 (
Figure 2B).
2.2. Chromosomal Localization, and Evolutionary Analysis in P. persica
The localization of predicted AGO, DCL, and RDR genes across the eight chromosomes in the peach genome was performed through a BLAST search to determine the physical location of each gene. As a result, the 11 AGO genes were unevenly distributed across the eight chromosomes (
Figure 2C). Notably, Chr2, Chr4, and Chr5 contained three duplicate paralogous pairs: PpAGO4b/PpAGO4a, PpAGO2b/PpAGO2a, and PpAGO1b/PpAGO1a, respectively, while Chr1 and Chr6 contained two AGO genes each, including PpAGO10, PpAGO7, and PpAGO5/PpAGO4c, respectively. The
Arabidopsis AGO4 gene mapped to three duplicated members in the peach genome, including PpAGO4a, PpAGO4b, and PpAGO4c, with the latter two genes being tandemly duplicated. The adjacency of these genes indicates that they originated through tandem duplication events, suggesting that tandem duplication is a key evolutionary process driving the expansion of this gene family in the peach genome.
The distribution of DCL genes in the peach genome showed that the eight predicted DCL genes could be mapped on five out of the eight chromosomes (
Figure 2C).
The nine identified PpRDR genes are located on four chromosomes (Chr1, Chr4, Chr5, and Chr7). Chr1 contained the majority of PpRDR genes (five genes), although it is the longest chromosome. The results showed that PpRDR genes, such as the paralogous pair of PpRDR1a (Prupe.4G078900) and PpRDR1b (Prupe.4G078800), were closely located on Chr4. The existence of homologous genes in the same location indicates that these genes originated by tandem duplication, giving rise to paralogous genes. Tandem duplication is frequently regarded as a primary factor driving various biological functions. Similarly, triple duplications of PpRDR1c, PpRDR1d, and PpRDR1e (Prupe.1G334600, Prupe.1G334500, and Prupe.1G332600, respectively) were located on Chr1. The results indicate that duplicated genes may have complex phylogenetic structures due to variations in their evolutionary alterations.
The duplication of PpRDR genes is centralized on four chromosomes, making it interesting to study the characteristics of phylogenetic relationships in different species. Triple copies or multi-gene sets of putative orthologs may contain paralogs that have not been detected. Notably, some PpRDR genes, such as PpRDR3 and PpRDR5, are absent, as shown in the phylogenetic tree (
Figure 2C) and NJ clusters (
Figure 1C). This absence might be due to high similarity between the reference genome sequences of
AtAGO and
AtRDR and their peach counterparts, leading to the absence of PpRDR3 and PpRDR5 in the BLAST results. The similarity between these genes includes molecular properties such as sequence length (992 and 977 amino acids) and molecular weight (112.8 and 110.9 kDa), respectively.
AtRDR3 and
AtRDR5 cluster as siblings with
AtRDR4 in the same clade. This high similarity may explain why PpRDR3 and PpRDR5 were not detected in the peach genome, possibly due to data processing deletions or resulting isoform copies of genes.
2.3. Gene structure and Motif Analysis in P. persica
To further validate and gain insights into the potential activities of the identified peach sRNAs biogenesis proteins, we analyzed their gene structure and conserved regions involved in RNA binding, enzyme catalysis, and other critical features. Previous studies have demonstrated that these predicted domains play crucial roles in protein activity in plants [
34,
35]. The functional domain analysis revealed that most peach AGO, DCL, and RDR proteins are highly conserved (
Figure 3)
Peach AGO proteins are primarily characterized by three domains: PAZ, MID, and PIWI (
Figure 3A), consistent with previous studies [
36,
37,
38]. It has been reported that the PAZ and PIWI domains play critical roles in RNase activity in AGOs [
12,
39,
40]. Both PAZ and PIWI domains were detected in all the putative PpAGO proteins, showing similarities with their
Arabidopsis orthologs. The PAZ domain is essential for binding the 2-nt 3' overhang of sRNAs, while the PIWI domain of certain AGOs has RNase activity [
35,
41]. The MID domain anchors the 5' phosphate end of sRNAs onto Argonaute proteins [
35,
39,
42,
43,
44,
45]. The putative PpAGO6 (Prupe.3G209300) was highlighted for structural prediction and generic domain structure analysis of AGOs. PpAGO6 has a protein length of 898 amino acids. The PAZ domain is located between residues D283 and S381 (99 amino acids in length), while the PIWI domain is located between F550 and K859 (310 amino acids in length). Both domains showed identical homology with RNase H, which binds to the 5′ end of the siRNA of the target RNA and cleaves it, demonstrating that sRNAs are complementary sequences [
46,
47].
The residues of AGO domains are involved in sRNA binding, sorting, and sRNA-targeted RNA pairing. sRNA sorting into different AGOs depends on features such as sRNA length and 5′ end nucleotide type [
48,
49,
50]. sRNA 5′ terminal nucleotide of sRNA is recognized by the nucleotide specificity loop within the MID domain (
Figure 3A). The MID domain is reported to be the main component of AGO with crucial functions in RNA silencing [
49,
51]. Additionally, other highly conserved residues or motifs with potential functional importance were detected within the 3′ end of sRNAs. These motifs are labeled as "L" in
Figure 3A.
Dicer-like (DCL) proteins are endonucleases with two RNase III domains [
52,
53,
54]. DCL proteins split both strands near the terminal loop to generate the miRNA duplex, containing the miRNA paired with its passenger strand.
Figure 3D presents PpDCL4 as an example model of peach Dicer-like proteins. PpDCL4 is composed of six functional parts: the PAZ domain (residues D776–V875), two helicase binding fragments, dicer dsRNA-binding residues, and two RNase III domains. The PAZ domain is connected to RNase IIIa on one side, while the helicase C-terminal contacts the PAZ domain from the opposite direction through dicer dsRNA-binding residues. RNase-IIIb follows RNase IIIa in structure. The helicase binding fragments include helicase ATP-binding (M1–S132) and helicase C-terminal (K298–T455), while the Dicer dsRNA-binding domain is located between S482 and E572. RNase-IIIa and RNase-IIIb, located between positions 902–1072 and 1113–1257, respectively, are the main components of DCL domains and interact directly with substrate RNAs. The overall structure of DCL domains resembles a hacksaw [
54].
The structure of PpRDR and the distribution of domains along the PpRDR sequence are depicted schematically in
Figure 3G.
PpRDR2 has a canonical RNA-dependent RNA polymerase (RDRP) domain characteristic of known polymerase structures, with the PDB chain regions playing an important role in the observed structure. The RDRP domain occupies the largest part of the PpRDR2 protein sequence, spanning 579 amino acids in length (I379–V957).
The Protein Data Bank (PDB) format is shown as a standard for files containing atomic coordinates. Using the Swiss-Model software for online analysis, we found that the tertiary amino acid sequences of 28 members were highly similar (
Figure 3B,E,H). In this study, we modeled the amino acid sequences of 28 RNAi-related genes from the three gene families using 3D structural homology. One gene from each family was chosen for modeling, as shown in
Figure 3C,F,I.
The analysis focused on conserved motifs involved in RNA-binding, enzyme catalysis, and other critical features to characterize the identified
P. persica silencing proteins. The predicted PpDCL, PpAGO, and PpRDR protein sequences were aligned with reference sequences of AtDCL, AtAGO, and AtRDR (
Figure 4). Conserved functional motifs in PpAGOs were confirmed, such as Y950, K958, Q978, N991, K995, and I996, crucial for the MID domain's role in sRNA 5′-phosphate-binding [
59], which were fully conserved across peach AGO proteins (
Figure 4A). The PAZ domain residues (G1929, D1931, V1932, and H1934) were also universally present. Furthermore, the RDGVS (1118–1122 aa) motifs were conserved in all AGOs of peach and Arabidopsis. Examination of the PIWI domain revealed conserved residues like E1133, D1191, DE (1224 and 1125), and H1161, implicated in enzyme catalysis [
38,
41,
55]. Further functional analysis is needed to elucidate the specific roles of these residues in PpAGO proteins. The glutamine-phenylalanine-valine (QF-V) motif critical for sRNA duplex recognition and sorting was conserved across all PpAGOs [
37,
38,
56] (
Figure 4A), suggesting their importance in sRNA 3′-end binding within PAZ and PIWI domains.
To assess the functional similarity of peach DCL proteins with Arabidopsis counterparts, we conducted multiple sequence alignments and motif composition analyses. Computational modeling of the catalytic core of AtDCL4 provided insights into the amino acids critical for dsRNA recognition, binding, and cleavage. The alignment revealed conserved RNase IIIa, RNase IIIb, and an RNA binding motif across PpDCL proteins (
Figure 4B). Specifically, the RNase III catalytic sites of peach DCL proteins featured glutamate (E), aspartate (D), aspartate (D), and glutamate (E) (EDDE), analogous to their orthologs in AtDCLs. Key residues include E1513, D1517, D1642, and E1645 for RNase IIIa, and E1737, D1741, D1745, and E1838 for RNase IIIb, with the RNA-binding motif characterized by the H-S loop (
Figure 4B). Additionally, sequence alignment of AtRDRs with putative PpRDR proteins identified the conserved D-DGD catalytic motif, a hallmark of the RDR conserved domain (
Figure 4C).
2.4. Orthologous Similarity and Collinearity Analysis for Non-Coding RNA Genes
Our analysis indicated that most gene pairs had a Ka/Ks ratio <1, signifying strong purifying selection and underscoring their essential roles in plant fitness under stressful conditions. Notably, significant sequence similarities were identified between peach and Arabidopsis AGO orthologs (
Figure 5A). AtDCL1 exhibited the highest similarity score among peach DCLs with its ortholog, scoring ≥ 0.75, followed by AtDCL2 and AtDCL4 with scores ≤ 0.50, while AtDCL3 showed lower similarity with its orthologous genes.
RDR genes were mapped using Blastp protein sequence comparisons [
57], revealing distinct similarity patterns between PpRDRs and their AtRDR orthologs (
Figure 5B). PpRDR1 genes comprised six copies (PpRDR1a to PpRDR1f), each showing high similarity with specific AtRDR counterparts, whereas other AtRDR genes exhibited less similarity with their duplicates. PpRDR2, PpRDR4, and PpRDR6 each showed varying degrees of similarity with different AtRDRs, with PpRDR6 (Prupe.1G480300) demonstrating the highest similarity score among all copies, followed by PpRDR2 (Prupe.5G176700).
Additionally, chromosomal collinearity analyses of AGOs, DCLs, and RDRs (
Figure 5C) revealed extensive conservation between peach and Arabidopsis. Five AGO genomic collinear pairs exhibited shared gene order: PpAGO7-AtAGO7, PpAGO10-AtAGO10, PpAGO6-AtAGO6, PpAGO2a-AtAGO2, PpAGO1a-AtAGO1, and PpAGO5-AtAGO5. DCLs showed lower collinearity with only one pair (PpDCL1-AtDCL1) maintaining collinearity. For RDRs, two collinear regions were identified: PpRDR1d-AtRDR1 and PpRDR4-AtRDR4. These findings underscore a conserved genomic structure between these species across the three non-coding small RNA gene families.
Predicted cis-acting elements in the promoter regions were identified for all candidate AGOs, DCLs, and RDRs genes, classified into three functional groups: phytohormone-responsive, specific expression and stress-related, and light-responsive-related. A total of 41, 40, and 42 promoter cis-acting elements were identified in PpAGOs, PpDCLs, and PpRDRs, respectively (
Figure S1). Predominantly, these elements were associated with specific expression and stress response, highlighting their crucial roles in both biotic and abiotic stress responses (
Figure S1). Notably, the analysis revealed a prevalence of CAAT-boxes and TATA-boxes across all AGOs, DCLs, and RDRs, underscoring their importance in transcriptional regulation. Additionally, TC-rich repeats, LTRs, and MBS promoters known to regulate responses to biotic and abiotic stresses [
58] were prominently represented, indicating their direct involvement in the regulation of sRNAs biogenesis genes, particularly under drought conditions (
Figure S1) [
59,
60].
In AGO genes, promoter regions such as TAG-box and TATC-box, HD-Zip 1, A-box, CCAAT-box, and LAMP-element were specifically detected in PpAGO6, PpAGO7, PpAGO1a, PpAGO2a, and PpAGO10, respectively (
Figure S1A). The analysis revealed consistent patterns of promoter cis-elements across homologous genes. For example, PpAGO1a and PpAGO1b exhibited 11 shared representations and 18 absences out of a total of 41 promoter regions. In DCL genes, MSA-like and MRE elements were identified exclusively in PpDCL3a, whereas A-box, HD-Zip3, AAAC-motif, and AT1-motif were present only in PpDCL3b. Additionally, Box II and chs-CMA2a were found in PpDCL1 and PpDCL4, respectively (
Figure S1B). Similar expression patterns were observed among paralogous genes, with PpDCL2a, PpDCL2b, and PpDCL2c sharing six representations and 18 absences. Moreover, PpDCL3a and PpDCL3b showed 10 shared representations and 6 absences out of a total of 40 promoter regions. In RDR genes, the Sp1 promoter region was identified uniquely in the PpRDR1b gene, while the ATC-motif and Circadian elements were present in PpRDR1c and PpRDR1d, respectively. Furthermore, Box III and ACE promoter regions were specific to the PpRDR1f gene sequence. Conversely, the GARE-motif and LAMP-element were exclusively detected in PpRDR2. Additionally, the TGA-box, AACA-motif, and A-box promoter regions were found only in PpRDR4, while chs-Unit1 m1 was identified in PpRDR6 (
Figure S1C). Similar patterns were observed among the six paralogous genes of PpRDR1, with five shared representations and 11 absences identified in the promoter cis-elements.
2.6. Expression Patterns of AGO, DCL and RDR Genes in Peach under Drought Stress
To predict the functions of AGO genes in peach, we analyzed their FPKM expression across different tissues including leaf, fruit, phloem, root, flower, and seed using available transcriptome data (v2.0.a1) (
Figure 6A). Our findings revealed widespread expression of all peach AGOs across multiple tissues. Notably,
PpAGO2a exhibited highest expression in roots followed by
PpAGO2b, while
PpAGO2b and
PpAGO4c were prominently expressed in leaves, indicating potential roles in drought stress response. Particularly,
PpAGO2b showed predominant expression in leaves, roots, and phloem tissues. These insights contribute to understanding the evolutionary dynamics of
AGO genes in Rosaceae and their roles in peach's response to drought stress.
Time-course expression analysis of
AGO genes under drought stress further demonstrated differential expression patterns across peach tissues (
Figure 6B).
PpAGO4c exhibited peak expression on the first day of prolonged drought treatment but downregulated at subsequent time points (36 hours, 5 days, and 14 days). Conversely,
PpAGO1a,
PpAGO2b,
PpAGO5, and
PpAGO10 showed varying upregulation profiles under prolonged drought conditions. Conversely,
PpAGO1b,
PpAGO2a,
PpAGO4b,
PpAGO4a,
PpAGO6, and
PpAGO7 displayed low relative expression in expression cluster 1 (EC1), suggesting potential co-expression in inducing drought resistance mechanisms under prolonged stress. Notably, phylogenetic analysis clustered
PpAGO4c,
PpAGO4b, and
PpAGO4a closely together, suggesting functional redundancy or neo-functionalization in evolution (
Figure 6B).
Environmental stresses profoundly influence plant gene regulation and adaptation. Stress-related genes are induced under adverse conditions to bolster plant resilience. Expression patterns of
PpDCL genes under drought stress have been previously investigated [
33] and are further detailed in supplementary results (
Figure S2).
RDR genes, crucial in small RNAs biogenesis, play pivotal roles in plant growth and development [
61,
62,
63,
64]. Expression patterns of
PpRDR genes across six tissues (root, fruit, seed, flower, phloem, and leaf) were analyzed to infer their functional roles (
Figure 6C). All
PpRDR genes were found to be expressed in these tissues, each exhibiting distinct expression patterns indicative of their roles in stress responses and development.
To delineate
PpRDR gene functions under drought stress, their expression patterns were analyzed across different time points (
Figure 6D). All
PpRDRs displayed differential expression across expression clusters (EC) in response to short-term and prolonged drought conditions. For instance,
PpRDR1c exhibited stable and elevated expression initially but decreased later, while
PpRDR4 showed dramatic fluctuations in expression levels across various time points.
PpRDR2 and
PpRDR6 showed parallel expression patterns, with
PpRDR6 displaying higher expression in EC2 and EC3. Conversely,
PpRDR1b exhibited the lowest expression among
PpRDR1 copies in EC1, suggesting a potential role as a chronic co-expression gene in drought resilience. The expression of
PpRDR1 copies (d, e, and f) was significantly upregulated under prolonged drought stress from 3 to 12 days, indicating functional redundancy or neo-functionalization within this clade. These findings provide valuable insights into the molecular evolution and adaptive responses of
PpRDR genes in Rosaceae, particularly in the context of drought stress in peach cultivation regions [
6].
PpRDR4 exhibited a distinctive expression pattern characterized by a sharp increase within the first 12 hours, followed by stable expression for the subsequent 12 hours, and a dramatic decline within two days of treatment. It then stabilized for a day until a notable increase was observed on the sixth day, followed by another decline on day 12, and a subsequent increase on day 14.
PpRDR2 and
PpRDR6 displayed similar expression patterns, although
PpRDR6 showed enhanced expression in EC2 and EC3. Conversely,
PpRDR1b exhibited the lowest expression during the initial phase of drought treatment in EC1, while
PpRDR1f and
PpRDR1a showed no significant expression patterns, implying their potential roles as constitutive co-expressed genes in drought resistance (
Figure 6F). Expression of
PpRDR1 copies (d, e, and f) was significantly upregulated from days 3 to 12 of drought stress, whereas other genes from the same clade as
PpRDR1 remained unchanged, suggesting functional redundancy or neo-functionalization from a common ancestor during evolution. These findings offer insights into the molecular evolution of
PpRDR genes within Rosaceae and their likely contributions to drought response in peach, a crop commonly cultivated in irrigated semi-arid and arid regions [
6].
2.7. Quantitative Real-Time PCR (qRT-PCR) Validation of AGO, DCL, and RDR Genes
To gain insights into the roles of
AGOs,
DCLs, and
RDRs in drought stress response, we assessed the expression of
PpAGO,
PpDCL, and
PpRDR genes in two
P. persica cultivars, ‘BJ2-7’ and ‘SN’ peach, known for their differing drought tolerance levels. Five candidate genes from each gene family were selected for qPCR analysis. Our findings revealed that under drought stress conditions,
PpAGO2a and
PpAGO2b genes were significantly induced in the leaves and roots of both cultivars. Notably, their expression was markedly higher in the leaves of ‘BJ2-7’ compared to ‘SN’. In contrast, drought stress inhibited the expression of
PpAGO4c and
PpAGO10 in the leaves of both cultivars, with
PpAGO4c showing higher expression in ‘BJ2-7’ and
PpAGO10 in ‘SN’. Additionally,
PpAGO5 was induced only in the leaves and roots of ‘BJ2-7’ but inhibited in ‘SN’, suggesting a potential role in drought tolerance specific to ‘BJ2-7’ (
Figure 7A).
Among the
DCL genes, 14 days of drought stress induced the expression of
PpDCL1,
PpDCL2a,
PpDCL3a, and
PpDCL4 in the leaves of both cultivars (
Figure 7B). Conversely,
PpDCL2b was downregulated in response to drought stress in both cultivars. Interestingly, the induction of
PpDCL2a,
PpDCL2b,
PpDCL3a, and
PpDCL4 was significantly higher in the leaves of ‘BJ2-7’ compared to ‘SN’. Moreover,
PpDCL2b and
PpDCL4 were induced only in the roots of ‘BJ2-7’ but inhibited in ‘SN’, while
PpDCL1 was inhibited in the roots of both cultivars, albeit with higher expression in ‘BJ2-7’ (
Figure 7B).
In the case of RDR genes, drought stress induced the expression of
PpRDR1b,
PpRDR1c, and
PpRDR1f in the leaves but inhibited them in the roots of both cultivars (
Figure 7C). Interestingly, the expression levels of
PpRDR1b and
PpRDR1f in the leaves, and
PpRDR1c in the roots, were significantly higher in ‘BJ2-7’ compared to ‘SN’. Although
PpRDR2 expression was induced by drought stress in both cultivars, it was inhibited in the leaves of ‘BJ2-7’ while unaffected in ‘SN’. Furthermore, drought stress inhibited the expression of
PpRDR4 in the leaves of both cultivars, whereas its expression was induced only in the roots of ‘BJ2-7’ but inhibited in ‘SN’ (
Figure 7C). These results highlight the differential responses of
AGO,
DCL, and
RDR genes to drought stress in peach cultivars with varying tolerance levels, providing valuable insights into their roles in drought adaptation mechanisms.