1. Alternative splicing isoforms as source of transcriptome and proteome diversity and contributes to phenotypic variation
Transcript expression and alternative splicing (AS) are two key pre-translational processes, which can generate phenotypic variation for all organisms [
1,
2]. While transcript expression levels are largely dependent on the interplay between promoter and enhancer activity regulating transcription rates and the rate of RNA degradation (or decay), AS can alter the transcript structure leading to modifications of the encoded protein structure to generate different protein isoforms or protein variants[
3]. Alternative splicing can also alter the 2-dimensional and 3-dimensional structure of RNA transcripts, with possibilities for altered functionality at the non-coding transcript level.
Regardless of the close spatiotemporality shared between transcript expression and AS, it has been considered that these two processes are independent of each other [
4,
5]. However, the regulatory relationship connecting these processes remains unclear [6-9]. Because AS can produce new protein variants, it has been suggested that AS is a major source of transcriptome and proteome diversity in eucaryotes, with ultimate impacts on phenotypic variation [
10]. Supporting this, a positive correlation has been indicated between the percentage of genes subject to AS and organismal complexity, measured in terms of unique number of cell types [
11]. Nonetheless, the role of AS in evolutionary processes such as speciation and adaptation remains largely unexplored [8-10].
Most transcriptome research has tended to focus on the relative expression levels of mRNA transcripts, both spatially and temporally. This emphasis has arisen due to the relative ease of investigating transcript expression levels with sequencing technologies and bioinformatic tools [12-15]. Within the canon of transcriptome research, only a subset has a focus on AS variation. However, from a functional viewpoint, there is a paucity of investigations that ascribe clear functional effects between the genome-wide extent of AS over time and space and major phenotypic effects. Hence, it is unclear whether AS is a major source of standing genetic variation that in turn generates phenotypic variation. This lack of clarity likely arises from the experimental challenges associated with functionally characterizing the effects of alternative splice isoforms on phenotype [
16,
17]. Understanding the functional effects of AS isoforms remains a complex task, [
10,
11]. An increasing number of investigations are revealing growing significance of AS in evolutionary processes, employing advanced techniques such as whole-transcriptome mRNA sequencing [
18].
In plants, the domestication process has generated multiple examples of rapid adaptation via AS [19-22]. One example of this is the
EARLY MATURITY 8 (
EAM8) gene in barley [
19], which is an orthologue of the circadian core component
EARLY FLOWERING 3 (
ELF3) in
Arabidopsis thaliana. It has been demonstrated that a mutated version of
EAM8 (
eam8.I), carrying an A to G transition in position 3257 at intron 3 which leads to an AS event with intron retention and a putative truncated protein, is responsible for the early flowering of a barley landrace from the Tibetan plateau, which is a short-season adaptation to high latitudes [
19]. Another example of AS is in the domestication of sunflowers, where the domestication process (approximately 5,000 years ago) was associated with a large frequency of alternative transcript isoforms generated by AS. In the AS analysis of sunflower domestication, both new combinations of ancestral spliced genes were found and also novel isoforms [
20,
21]. These examples suggest that AS can be an important component in evolution and domestication contributing to phenotypic variation within and between natural and domesticated populations.
In addition to phenotypic variation, phenotypic plasticity is also a key force in evolution and adaptation [
23,
24]. While the role of transcript expression is well understood, little is known regarding the potential of AS to generate phenotypic plasticity [
10,
25]. In plants, the association of environment-triggered AS with environmental stress responses suggests that AS could act as a "molecular thermometer" [
26]. However, the role and underlying mechanisms by which AS can produce plastic phenotypes in novel ecological or environmental contexts are largely unexplored.
2. Bioinformatic tools, software, and computational methods to quantify and visualize splicing variants
Transcriptome-wide analyses of AS in tissues and plants subjected to various biotic and abiotic stresses as well as in different cultivars have been performed using high-throughput next-generation sequencing technologies such as Illumina RNA-seq (RNA-seq) [27-29], Pacific Biosciences single-molecule real-time (SMRT) long-read Isoform sequencing (PacBio Iso-seq) [
30,
31], and Oxford Nanopore direct RNA sequencing (dRNA-seq), also called native RNA sequencing [29,32-35]. All these technologies involve sequencing of fragments of total cellular RNA [ribodepleted/poly(A)] or chromatin-associated RNA converted into cDNAs (Illumina short reads), full-length cDNAs (PacBio Iso-seq and Oxford dRNA-seq) or full-length RNAs (Oxford dRNA-seq). Among these, RNA-seq using the Illumina platform has been widely used as it is cheaper and yields more reads. Large-scale Illumina RNA-seq research allowed the prediction of AS events [36-39]. However, there are limitations with Illumina short reads. The transcript assemblies from short reads are often inaccurate and produce large numbers of misassembled transcripts and missing real transcripts [
40,
41]. Also, research has shown that it is difficult to reconstruct splice isoforms and quantify differential expression of isoforms using short reads [
42,
43], which is necessary to determine the nature of the encoded protein and in assessing a splice variants’ role [
32,
43]. To overcome these limitations with short reads, PacBio Iso-seq, which provides long reads, has been used for accurate identification of full-length splice variants and other post-transcriptional regulatory events such as alternative transcription start sites and alternative polyadenylation sites [
30]. As compared to Illumina RNA-seq, PacBio Iso-seq provides more comprehensive insights into different splicing events and isoform diversity and tissue/condition-specific splicing regulation. Since 2016, Iso-seq has been used to analyze the splice isoforms in several plants and this has provided a more detailed and in-depth view of numerous novel splice isoforms [30,31,44-47]. The most recent annotated splice isoforms in AtRTD3 (
Arabidopsis thaliana Reference Transcript Database3) were assembled with PacBio Iso-seq and Illumina using RNA from many organs/tissues that were subjected to different stresses [
31]. The Oxford Nanopore sequencing, which also provides long reads of cDNA or RNAs (dRNA-seq), is increasingly used in recent years to predict splice isoforms and other post-transcriptional processes including base modifications. Other specialized high-throughput technologies such as Ribo-seq are used to assess the translation of splice isoforms [
48,
49]. Although Iso-seq and dRNA-seq approaches can generate full-length transcript sequences, the major issues are the limited depth in coverage, and high error rates, which generate many mis-annotated transcripts [
30,
31,
50,
51]. Self- or hybrid-correction methods have been used to overcome the effects of sequencing errors in long reads [
30,
31,
51]. Self-correction uses the raw signal and consensus-based calls to reduce errors while hybrid correction uses Illumina short reads to correct errors in the long reads. Despite the shortcomings of each of these methods, global research of AS in plants revealed enormous complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level [29,31,32,52-54]. In plants, pre-mRNAs of about 80% of intron-containing genes undergo AS, an essential regulatory mechanism in many developmental and physiological processes that affects thousands of genes [28,31,55-57]. For example, research has shown that about 25% of genes that respond to cold stress are regulated by AS [
58] and 20 splicing regulators of the SR family produce close to 100 distinct transcripts [59-61]. Intron retention is the predominant form of AS in plants, whereas exon skipping is the most prevalent AS event in animals [
55,
62,
63]. However, Braunschweig et al. [
64] have shown that IR is highly prevalent in mammals. Research has shown that IR is a regulated process that plays a role in development, stress responses, and disease [64-67].
Accurate reconstruction of transcript isoforms and quantification of the relative abundance of individual splice isoforms are necessary for a comprehensive analysis of transcriptomes, and to decipher the biological functions of individual transcripts. Many computational pipelines have been developed to analyse RNA-seq data to identify AS events, estimate isoform abundance, and differential expression of splice variants across tissues/conditions. Some of the tools/pipelines used for AS analysis are shown in
Table 1. These methods use different statistical models, and each has advantages and disadvantages [30,68-70]. Depending on the type of reads (short or long reads) and sequencing platform, different computational methods are used. These methods involve alignment of sequence reads to the reference genome (or reference transcriptome in some cases) and allow detection of specific splicing events (exon skipping, intron retention, alternative 3’ and 5’ splice sites, etc.) and full-length splice isoforms in some cases thereby providing insights into their functional implications. There are also
de novo assembly tools but these methods are highly prone to the assembly of erroneous transcripts [
71,
72]. More recently, machine learning tools especially deep learning methods are being increasingly used to develop models that can accurately predict splicing/AS patterns of pre-mRNAs and gene expression from genome sequences in humans [73-79]. These methods are yet to be applied to splicing analysis in plants. The deep learning models determine splicing determinants directly from the nucleotide sequence [
73], splice site strength in tissues [
74], and the impact of genetic variation on RNA splicing [
74,
75]. These emerging methods offer new ways to predict tissue/condition-specific AS and the effects of genetic variation in plants on the splicing of protein-coding and -non-coding RNAs and the biological significance of splicing changes.
Different genome browsers including Integrative Genomics Viewer (IGV -
https://software.broadinstitute.org/software/igv/), Integrated Genome Browser (IGB -
https://www.bioviz.org/), or UCSC Genome Browser (
https://genome.ucsc.edu/) allow loading of aligned files (BAM files) to visualize sequence depth corresponding to each exon and intron, and AS events. Sashimi plot tool that is part of MISO (mixture of isoforms -
https://miso.readthedocs.io/en/fastmiso/) software, which is also available on IGV (
https://software.broadinstitute.org/software/igv/Sashimi) takes RNA-seq alignment files (BAM files) and gene annotations as input and provides a comprehensive view of AS patterns. The output plot shows gene structure including exons and introns, splice junctions, AS events, read coverage, and relative abundance of splice isoforms across tissues/conditions. Isoform expression level and individual splice events such as the percent “Splice In” of an AS event across samples can also be visualized using heatmaps [
80]. Absolute quantification of splice isoforms in tissues or in response to signals can also be performed using “Quant AS” using a combination of quantitative PCR and digital PCR.
Table 1.
Some commonly used tools to analyse RNA-seq data for alternative splicing.
Table 1.
Some commonly used tools to analyse RNA-seq data for alternative splicing.
3. Mining gene pools for splicing isoforms and diversifying gene functions to obtain novel phenotypic diversity
Alternative splicing allows a gene to encode for various proteins because its exons are put together differently, thus resulting in related but distinct mRNA transcripts. It has been demonstrated that thale cress (
Arabidopsis thaliana) uses AS disproportionally as a stress response [
89] cite ref 28 also. There are other plants showing a splicing memory that remembers an environmental stress such as heat [
90], which leads to a response to an increase in temperature. Moreover, a synthesized
Brassica hexaploid had significant AS events [
91], thus diversifying its gene expression patterns that could improve its adaptability. Furthermore, Zhang et al. [
92] indicated that many genes contributing to quantitative traits are likely to be spliced into multiple transcripts causing their variation.
The availability of both genome and transcript sequences in plants enables a thorough analysis of AS in various species, including crops [
93]. Multivariate analysis of transcript splicing (MATS) and replicate MATS (rMATS) are robust and flexible statistical software that detect differential AS between two RNA-Seq samples [
94], or replicate RNA-Seq data [
81], respectively. The synthetic programming of AS patterns, however, remains underused for improving crops [
95]. Hence, Pramanik
et al. [
96] suggested CRISPR/Cas9-mediated engineering for modifying AS with the aim of (de)regulating plant development.
Genome-wide mapping led to identification of thousands of AS mRNAs isoforms in thale cress [
36]. Most of the AS transcripts related to isoforms with premature termination codons, which could shift under abiotic stress. Li
et al. [
97] did a search of AS affecting reproductive development of young panicles as well as both unfertilized and fertilized florets in rice with the aid of direct RNA sequencing, small RNA sequencing and degradome sequencing. They found 35,317 AS events, of which in excess of 2/3 were novel, and concluded that AS was significantly related to development stages and to complex gene regulation in rice. An RNA-seq survey was able to define AS patterns, and to determine that 59.3 % of expressed multi-exon genes underwent AS in seedlings, flowers and young developing fruits of tomato [
98]. The use of a single molecule long-read sequencing (Iso-Seq) led to an integrated transcriptome data analysis that facilitated investigating AS in polyploid cotton [
99]. This Iso-Seq data analysis was able to identify 15,102 fibre-specific AS events, as well as to notice that about 51.4% homeologous genes produce divergent isoforms in each cotton sub-genome.
4. Molecular mechanisms regulating stress-dependent gene-splice variants
Numerous RNA-seq investigations with plants subjected to various biotic and abiotic stresses have revealed that AS of pre-mRNA is widespread. Furthermore, stresses and developmental cues have a profound impact on the splicing patterns of many genes [28,29,31,44-47,59,100-107]. Despite the prevalence of AS and its role in stress responses, the regulatory mechanisms of splicing and functions of most splice isoforms are not well understood in plants. Decoding the splicing code in plants would require a comprehensive understanding of the rules that dictate splice site choice and identification of specific mRNA targets of splicing regulators. A variety of factors including splice site strength, and the presence of exonic and intronic splicing enhancers and suppressors affect splice site choice, and RNA structural features [108-110] also contribute to AS. Limited research with plants has shown that sequence elements are one of the important determinants of splice site choice [111-114]. Interestingly, the alternatively spliced genes are over-represented in functional categories related to splicing regulators and stress responses [
36,
103,
115,
116]. RNA binding proteins such as serine/arginine-rich (SR) and heterogeneous nuclear ribonucleoproteins (hnRNPs) are some of the key regulators of splicing. Alternative splicing of plant pre-mRNAs encoding SR proteins is dramatically altered in response to various stresses [56,59,117-122]. The changes in the levels of these splicing regulators in response to stresses may change the splicing of other pre-mRNAs due to auto- and cross-regulation of splicing [111,123-127]. These investigations suggest that altered ratios of splice variants of splicing regulators in response to stresses may have a role in fine-tuning gene expression at the mRNA and protein level and the adaptation of plants to stresses [
28,
128]. Also, many stress-responsive genes are associated with significant splicing quantitative trait loci (sQTL) in
Arabidopsis thaliana ecotypes, suggesting a role of AS in plant stress responses [
129].
There are several hundred RNA-binding proteins (RBPs) in any given plant species and the precise roles of most of these proteins in co-/post-transcriptional processes are unknown [
130]. Many approaches to identifying the roles of RBPs in splice site choice are available and a comprehensive review of these methods was recently published [
29,
131], hence is not covered in any detail here. In animals,
in vitro splicing assays have greatly contributed to our understanding of the roles of spliceosomal and other splicing regulatory proteins in splicing and elucidating steps in spliceosome assembly and spliceosome composition. However, the lack of a robust plant-derived
in vitro splicing system in plants has been a major limitation [
132]. Hence, other biochemical, cell biological, genetic, and genomic approaches are used to understand splicing regulation in plants [
28,
29,
103,
109,
133]. Application of new methodologies such as identification of targets of RNA binding proteins using TRIBE (targets of RNA binding proteins identified by editing) [
133,
134] and targeted isoform degradation with CRISPR/Cas13 variants [
135] may provide insights into targets of hundreds of uncharacterized RNA binding proteins and elucidating isoforms faction. With TRIBE, the targets of an RNA binding protein are edited irreversibly by deaminating adenosine to inosine, which is then recognized as guanosine in cDNAs [
136] or modified inosines can be identified directly with nanopore native RNA sequencing [
137]. RNA from the RBP-ADAR-expressing plants is sequenced to identify the RNA targets of the RBP by edited events. Expression of specific isoforms in the mutant background or degrading specific isoforms using CRISPR/Cas13 variants (e.g. Cas13d and Cas13x) that specifically bind RNA [
135] open a novel and efficient way to study the functions of splice isoforms.
Emerging evidence suggests that stresses/external cues converge on splicing regulators via different signalling pathways. For example, two proteins [Highly ABA-Induced 1 (HAI1), a protein phosphatase 2C and its interacting RNA binding protein, HIN1(HAI interactor 1, HIN1), an RNA binding protein] involved in drought acclimation interact with SR family of splicing factors and regulate splicing [
107]. Phytochromes, key light receptors and regulators of many aspects of plant growth and development, interact directly with several splicing regulatory proteins and modulate AS of many pre-mRNAs [103,138-140]. The light- and drought-regulated alternatively spliced transcripts contain GAA repeats [
107,
138] that are known to bind splicing regulators (e.g. SCL33, SCL30, SR45), suggesting that stress signaling pathways could converge on these splicing regulators [
111,
113,
141]. A mutant of SR45, which encodes a splicing factor, showed altered responses to abiotic and biotic stresses [
142,
143]. Like abiotic stresses, biotic stresses also change the splicing patterns of many genes. Recent research shows that pathogens effectors modulate host pre-mRNA splicing by binding to splicing regulators such as serine/lysine/arginine-rich proteins, U1-70K, SR30, SR45, and GRP7, and suppress plant immunity [80,144-146], suggesting that pathogens have evolved effectors that target host splicing components and subvert plant immunity. It has been shown that many splicing regulators and spliceosomal proteins form speckles (also called biological condensates or membraneless organelles) and stresses alter the dynamics of proteins in these structures, and also the size/shape of these structures [133,147-155], suggesting that external signals through reorganization of speckles and their constituent proteins affect pre-mRNA splicing. However, the mechanisms of stress-induced reorganization of speckles in plants are yet to be understood. Also, the phosphorylation status of many spliceosomal proteins and regulatory splicing factors is known to play an important role in pre-mRNA splicing [
156] and stresses may alter phosphorylation status and function of splicing regulators.
Until recently, the splicing code has been thought to consist primarily of exonic and intronic sequence motifs that recruit RBPs that either enhance or suppress the selection of nearby splice sites [
55,
157]. However, in recent years, most pre-mRNA splicing was found to occur co-transcriptionally in both plants and animals [
53,
54,
158], suggesting that chromatin state may affect splice site choice and AS. Emerging research provides evidence in support of multiple regulatory mechanisms at the chromatin level (open vs closed chromatin, epigenetic modifications including histone modifications and DNA methylation), and the speed of transcription as key regulators that determine the outcome of AS in plants [
28,
159,
160]. A rice mutant (
OsMet1-2) with impaired DNA methylation altered all types of AS events [
159]. Also, a mutant with reduced histone H3 lysine 36-specific methyltransferase in rice showed altered intron retention events [
160]. In Arabidopsis and rice, open chromatin was found to be associated with intron retention [
161]. Higher speeds of transcription in open chromatin regions provide less time for the spliceosomal machinery to recognize and excise introns co-transcriptionally [
162,
163]. Alternatively, accessible chromatin regions could be the sites of binding for TFs or other regulatory proteins that recruit splicing factors directly or indirectly through chromatin modifications to affect the outcome of splicing [
64,
164]. The rate of pol II elongation during transcription was shown to be involved in light-regulated AS of splicing factors [
165,
166]. A point mutation in Pol II with increased elongation speed increased splicing, indicating a role for Pol II speed in splicing regulation [
166]. Furthermore, an increase in two epigenetic changes (H3K4me3 or H3K9ac) increased the rate of transcription elongation and lowered co-transcriptional splicing efficiency [
53]. A double mutant,
rz-1b rz-1c, of hnRNP-like proteins showed impaired splicing of nascent RNAs, suggesting that these proteins promote splicing at the chromatin level [
53]. The direct association of RZ-1C with nascent RNAs further supports its role in co-transcriptional splicing [
53]. It has been shown that a shift in temperature alters H3K36me3 methylation and AS [
167] and low temperature changes RNA Pol II elongation kinetics and reduces co-transcriptional splicing [
168]. The involvement of chromatin modifiers and a mediator complex in splicing regulation was also reported and some of these proteins interact with spliceosomal proteins [
169,
170]. A phosphoprotein phosphatase required for Pol II occupancy was found to promote intron excision [
171]. Collectively, these investigations indicate that the epigenetic state of chromatin and the dynamics of transcription modulate AS in plants.
One of the key adaptive changes in response to stresses in plants is the post-transcriptional reprogramming of gene expression [
172,
173]. The resulting transcript isoforms fine-tune gene expression in profound ways to cope with stresses [90,128,174-176]. Research discussed above indicate that stresses/external cues through some yet-to-be-elucidated signalling pathways converge on splicing regulatory proteins and chromatin architecture to modulate AS. An in-depth understanding of splicing code in plans and the roles of splice variants will have applications in fine-tuning gene regulation and developing stress-resilient crops as stresses and developmental cues dramatically alter the levels of splice variants that encode proteins involved in stress responses and plant growth and development [
28,
29,
103].
8. Establishing a platform for cataloguing, curating, and retrieving alternative splicing isoforms and gene expression quantification database across tissues, development, and stress conditions
One of the major challenges in researching AS and transcript expression is the fragmented nature of data, in which AS isoforms and transcript expression data are scattered across many investigations and datasets, often lacking standardized annotations and metadata [
211,
212]. Such fragmentation hinders efficient data retrieval, comparison, and interpretation. Furthermore, inconsistencies in data formats and quality pose additional obstacles for researchers [
211,
212]. To address these challenges, the development of unified platforms for cataloguing, curating, and retrieving AS isoforms and GE quantification data is important. In this context, multiple attempts to generate a unified platform for GE and AS has been published to the date [211-216]. The aim of such platforms provides to researchers with a centralized resource for accessing comprehensive and high-quality data across different biological contexts and investigations.
Accessibility of such platforms requires a focus on user-friendly interfaces, powerful search functionalities, and intuitive data visualization tools. This can allow researchers to better explore and analyse complex AS patterns and transcript expression dynamics. Advanced algorithms and computational tools are being implemented to enable more comprehensive data analysis, allowing researchers to uncover novel insights into AS and transcript expression [211-216]. For instance, the PlantExp platform integrates plant transcript expression and AS profiles from 131,423 uniformly processed publicly available RNA-seq samples that belong to 85 plant species across 24 plant orders [
213]. This platform not only allows researchers to investigate and navigate across AS and transcript expression profiles, but also allows differential and specific expression analysis, analysis of co-expression networks, cross-species expression conservation analysis and easy visualisation of data [
213].
Such platforms not only facilitate data-driven research, but also promote collaboration between scientists working on AS and transcript expression. By integrating fragmented data, ensuring data quality and accessibility, and providing powerful analysis tools, such platforms empower researchers to explore the intricate relationship between AS and transcript expression. In addition, scientists can flexibly customize sample groups to reanalyse publicly available RNA-seq datasets and obtain new insights [211-216].
9. Alternative spliced circadian clock genes in response to abiotic stress
Circadian clock genes are a key point of regulation for adaptation to new environments and abiotic stress conditions. Alternative splicing plays a crucial role in the regulation of many core clock genes in plants, and represents an important mechanistic link between the core of the circadian clock and diverse environmental inputs [217-222]. One example is partially redundant MYB-related transcription factors CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) and LATE ELONGATED HYPOCOTYL (LHY) [
223]. Under cold conditions a non-functional spliced variant of LHY, which has a premature stop codon, is accumulated [
217]. In contrast, the spliced variant of CCA1β, which lacks the MYB-like DNA binding domain by retaining the fourth intron, is inhibited at low temperatures. The result is that CCA1β interferes with the formation of CCA1α (functional full-length CCA1) and LHY hetero- and homo-dimers [
224]. The overexpression of
CCA1β reduces freezing tolerance in Arabidopsis, while the overexpression of
CCA1α increased tolerance to cold conditions [
224]. Thus, the opposite regulation of
LHY and
CCA1 under low temperatures has revealed the role of AS in ensuring the balance of LHY and CCA1 under acclimation to low temperatures [
217,
224].
In addition to
CCA1 and
LHY, other genes of the clock have been shown to display AS related to cold stress. For example, the spliced version of
TIMING OF CAB EXPRESSION1 (
TOC1β) is transcriptionally increased at low temperatures, while the
ELF3β is suppressed under the same conditions [
220]. During dial time course at 20°C the non-fully spliced transcript of
REVEILLE (
RVE2) is accumulated, while plants acclimated to 4°C primarily produce the functional fully-spliced transcript [
218]. Other abiotic stresses are also involved in differential alternative slicing in plants. For instance, heat stress triggers AS increasing the levels of
CCA1β,
PSEUDO-RESPONSE REGULATOR7β (
PRR7β),
TOC1β and
ELF3β, while saline conditions do not seem to affect AS of
CCA1,
PRR7,
TOC1 and
ZEITLUPE (
ZTL) genes, but reduce the
ELF3β variant over the
ELE3α, revealing a role of AS in the regulation of
ELF3 under salt stress [
220]. It has been shown that the splice variants of
TOC1 and
ELF3 undergo degradation via the nonsense-mediated decay (NMD) pathway, while the splice variants of other clock genes exhibited insensitivity to NMD [
220].
Multiple spliceosome components are involved in AS of core plant clock genes. For example, the conserved methyltransferase PROTEIN ARGININE METHYLTRANSFEREASE 5 (PRMT5), involved in histone methylation, regulates AS of PRR9 [
225,
226]. Another key spliceosome component involved in AS of circadian genes is SNW/SKI-INTERACTING PROTEIN (SKIP). It is proposed that SKIP regulates AS of
CCA1,
LHY,
PRR7,
PRR9 and
TOC1 by modulating the recognition of 5′ and 3′ splice donor and acceptor sites. Conversely, loss of SKIP causes a long-period phenotype [
219]. Likewise, mutants of
SPLICEOSOMAL TIMEKEEPER LOCUS 1 (
STIPL1), a homolog of a human spliceosome protein, also cause a long-period phenotype. Additionally, in
stipl1 mutants, transcript levels of the spliced variants of
CCA1,
LHY,
PRR9 and
TOC1 are altered [
227].
Core components of the spliceosomal U6 small nuclear ribonucleoprotein complex,
SM-like (
LSM) genes, also regulate circadian rhythms in plants. Mutations in
LSM5 or
LSM4 in
Arabidopsis, extend the circadian period by affecting AS more than constitutive splicing [
228]. Another spliceosomal small nuclear ribonucleoprotein assembly factor, GEMIN2, has been suggested to attenuate the effects of temperature on the circadian period by regulation of AS of clock genes such as
CCA1,
TOC1 and
PRR9 [
229]. Despite these discoveries, complete details of the molecular mechanisms involved in AS effects on circadian components remain unknown.
11. Applied aspects of splice isoforms in controlling agricultural traits
Alternative splicing produces more than one mRNA from a single pre-RNA molecule in plants, thus increasing transcriptome plasticity and proteome complexity [
236], and affecting plant metabolism at different development stages [
237]. AS provides therefore means for plants to adapt to changing surrounding environments by regulating their fitness, particularly when they grow under stress [
238], e.g. in the response of barley’s clock genes to low temperature [
221] or during infection of blast fungus in rice [
239]. The recent advances in next-generation sequencing coupled with extensive transcriptomic resources have facilitated the understanding of AS role in regulating developmental processes in plants for adapting to stress-prone environments [
240].
Splice variants affect agronomic characteristics in crops, e.g. floral development in cereals [
241], seed shattering and weight in rice [
242], grain size and weight in wheat [
189,
243], plant architecture in soybean [
244], as well as nutritional quality in rice [
207,
245], soybean [
246,
247], tomato [
248] and wheat [
249]. Genome-wide association genetic analysis (GWAS) can further reveal how AS variants diversify gene function and regulate variation in crops, as done by Chen et al. [
22] in maize. They found ca. 20,000 unique splicing quantitative trait loci for 6570 genes affecting protein functions in 366 inbred lines.
12. Conclusion
AS of pre-mRNA is widespread and the major source of transcriptome and proteome diversity, which in turn generates phenotypic variation. A variety of computational pipelines including deep learning machine tools methods are now available to analyze RNA-seq data to identify AS events, estimate isoform abundance, and differential expression of splice variants across tissues/conditions, and development stages.
Domestication and polyploidization (Brassica species, wheat) besides environmental perturbation cause varying expression of AS isoforms in plants. Arabidopsis uses AS isoforms as stress response mechanism to enhance its adaptation to a range of geographically diverse agro-ecologies. To date, many AS quantitative trait loci (sQTLs) for large number of genes with distinct protein functions impacting phenology, plant architecture, biomass yield or quality including nutrient homeostasis, and stress responses have been reported in grain (maize, rice, sorghum, wheat), oil (Brassica species., soybean), and fiber (cotton) crops. Many of these sQTLs colocalize with known pQTLs impacting phenotypic variation. Evidence also suggests that AS variants contribute to functioning of symbiosis (mycorrhiza, rhizobium) in plants, heterosis in grain and oil crops, and provide a mechanistic link between the core of the circadian clock genes and diverse environmental stimuli.
Though significant advances in genome wide expression of AS variants have been made in various crops, applying such advances poses a significant challenge in crop improvement programs, which include but not limited to, i) a significant bottleneck to establishing cost-effective high throughput assay to identify AS variants in early breeding generations, ii) differentiating and quantifying the impact of sQTLs from pQTLs for genes impacting phenotypic variation, iii) accurate reconstruction of transcript isoforms and quantification of relative abundance of individual isoforms in deciphering the biological functions of individual transcripts iv) identifying common genetic tags (e.g. SNPs, InDels and structural variation) linked with AS variants and gene expression, and v) possible adverse effect of combining AS variants with trait-gene(s) on phenotypic variation. Until such logistical issues are resolved, the exploitation of AS variants in crop improvement programs will be limited to discovery and functional characterization of AS variants across tissues or conditions, and development stages in plants.