1. Introduction
Gene regulation represents a collection of mechanisms by which cells increase or decrease the production of specific gene products. Cells from different tissues turn on specific genes and turn off others thus contributing to tissue-related differences in the expressions of the whole repertoire of genes. Aberration in gene regulation may promote deviation from normal pathway activity, sometimes leading to complex diseases. Expression of any gene may be regulated due to DNA modification, transcriptional regulation, epigenetic gene regulation, regulation by RNA etc. While only a little over
genome code for protein [
1], the remaining genome that was once thought of as junk has emerged as a crucial hub of gene regulators.
These gene regulatory elements comprise non-coding DNA sequences that provide binding sites for some proteins, called transcription factors (TFs). These TFs play an important role in controlling the transcription rate of genetic material from DNA to RNA. TFs either alone or with other proteins form complexes to promote (called activators) or block (called repressors) the recruitment of RNA polymerase to specific genes. TFs participate in gene regulation by binding themselves to certain DNA regions such as enhancers and promoters often located near the genes. Enhancers are non-coding DNA sequences that contain multiple activator and repressor binding sites. In contrast to enhancers, silencers are regions on DNA that, when bound by specific TFs, repress gene expression. Marked by distinct chromatin features, enhancers contribute to the repertoire of epigenetic mechanisms responsible for cellular memory and cell type-specific gene expression [
2]. In addition, enhancers play a vital role in spatiotemporal gene expression which is tissue-specific activation of genes at specific times during development. To regulate gene expression, active enhancers come in contact with closely located promoter regions of genes or distant promoter regions of genes through DNA looping [
3]. While an active enhancer interacts with multiple promoters [
2], an active promoter also interacts with multiple enhancers. Genetic mutations within enhancer have been implicated with tumors and various common diseases [
4,
5]. Clusters of regular enhancers in close genomic proximity that are associated with high levels of transcriptional co-activators (e.g. mediators), bear active chromatin marks and contain sequence motifs corresponding to cell-type specific TFs more than typical enhancers, are called super-enhancers, abbreviated as SEs [
6]. The enhancers comprising an SE might be responsive to different signals, thus allowing multiple signaling pathways to regulate the transcription of a single gene [
7]. Many GWAS SNPs are shown to be significantly enriched in SEs [
8,
9,
10].
Role of ChIP-Seq technology: Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is a standard assay for mapping genome-wide protein–DNA interactions viz. characterization of binding sites for TFs and other DNA-binding proteins, enables identification of sequence motifs etc. [
11]. ChIP-Seq also contributes to the identification of histone marks that act as major modulators of chromatin structure which regulates gene expression epigenetically. Apart from bioinformatic approaches [
12], algorithms such as Hypergeometric Optimization of Motif EnRichment (HOMER) [
13], and Rank-Ordering of Super-Enhancers (ROSE) [
6,
14] identify SEs from ChIP-Seq data. Since gene regulation is closely correlated with the 3D conformation of chromatin, other experimental assays such as Hi-C (high-throughput chromosome conformation capture) and Hi-C derivative technologies viz. ChIA-PET (chromatin interaction analysis based on paired-end-tag sequencing), HiChIP (in situ Hi-C followed by chromatin immunoprecipitation) etc. are also studied for enhancer identification. These technologies explore the chromatin structure to identify enhancer-target gene maps [
15] that are also well validated by experimental methods [
16]. Several databases using these technologies unravel information on enhancer-target gene, enhancer-disease, enhancer- target TF, enhancer-cancer type etc. along with enhancer and SE identification. Based on ChIP-Seq experiments, SEs from cancer or normal tissue/cell lines are identified by databases such as SEdb 2.0 [
17], SEanalysis [
18], CenhANCER [
19], ENdb [
20], EnhancerAtlas 2.0 [
21] etc.
Role of miRNA in gene regulation: Like non-coding DNA, non-coding RNA molecules such as microRNA (miRNA) are also involved in regulation of gene expression. These approximately 21 nucleotide long miRNAs form are a class of small RNA regulators participating in RNA silencing and controlling gene expression post-transcriptionally [
22]. miRNAs bind to target messenger RNA (mRNA) transcripts of protein-coding genes and regulate gene translation. Individual miRNAs often target hundreds of mRNAs to modulate expression of genes in various critical pathways such as developmental, metabolic etc. or completely switch on or off mRNA translation [
22]. Animal and plant miRNAs are respectively complementary to 3′ UTR and coding regions of target mRNAs. For partial complementary miRNAs to recognize their targets, 2-7 nucleotides (called seed region) of the miRNA must be perfectly complementary to the target mRNA. Oftentimes, animal miRNAs form imprecise base pairing causing translational repression while the near-perfect base pairing in plants results in target degradation. Experimental identification of miRNA genes and their targets was challenging and involved laborious techniques leading to the development of computational algorithms [
23] such as base-pairing patterns, thermodynamic stability of miRNA-mRNA hybrid etc. [
24]. Using these algorithms, several computational target prediction methods viz. TargetScan [
25], TargetScanS [
26], PicTar [
27], miRanda [
28] etc. have been implemented. Many databases such as miRBase Targets [
29], TarBase5.0 [
30] etc. have been built for integrating functional annotations with the predicted miRNA sequences and their targets [
24]. These databases are either based on literature mining or computational algorithms that use the aforementioned target prediction methods. However, depending on the algorithms used, computational methods incur varying amounts of false positive results in identifying target mRNAs. Thus, the overlap of information across different databases is often low [
24]. Some databases like GOMir [
31], miRecords [
32] etc. integrates information from multiple prediction algorithms to get accurate results but it often worsens the inference [
33]. This necessitates experimental validation to check the interaction between miRNA and its target mRNA directly [
24] or computational methods that would learn from the true positive results of the existing large databases [
34]. However, recent databases such as mirDIP 4.1 [
35] claims to provide more accurate miRNA-target predictions by combining predictions from 30 independent resources using integrative score. Other methods such as MiRror [
36], miRGate [
37] also provides integrated target predictions.
Joint effect of miRNAs and TFs in gene regulation: Apart from regulating one another, sometimes miRNAs and TFs co-regulate ’target hubs’ of genes related to specific pathways [
38]. Experimental methods showed that the miR-200 family and miR-205 together regulate epithelial-to-mesenchymal transition (EMT), by targeting E-cadherin transcriptional repressors ZEB1 and SIP1, which are previously implicated in EMT and tumour metastasis. Since EMT facilitates tissue remodelling during embryonic development and is viewed as an essential early step in tumour metastasis, these miRNAs are implicated in tumour progression via TFs [
39]. Other experimental studies also illustrate the role of miRNA-TF co-regulation in playing a vital role in gene regulation [
40]. The interaction between miRNAs and TFs involving auto-regulatory feedback loops is a common regulatory mechanism for controlling gene expression [
41]. Many computational methods and databases identify networks to predict miRNA-TF, miRNA-mRNA, miRNA-TF-mRNA [
42], and miRNA-disease regulatory relations [
43]. Some of these methods are based on the networks that consider feed-forward, feed-backward etc. loops that controls gene regulation [
44].
Evidence of gene regulation jointly via miRNAs, TFs and SEs: On the other hand, collective evidence illustrate the potential of SEs in unravelling gene regulatory network to a great extent. Recent experimental validation of the involvement of SEs in miRNA production, miRNA-associated disease, and TFs [
45,
46] provides an insight into its potential in unravelling gene regulation. For example, over-expression of miR-1301 induced by the disruption of SE KLF6 leads to a significant inhibition of cell proliferation [
47]. Although enhancers and SEs are found to be associated with disease genes experimentally, they have not been computationally explored much to identify their connection in gene regulation network. Although some databases [
48,
49,
50] connecting SE/enhancer with target genes, miRNAs and TFs are available but still there is a lack of computational method that might systematically explore the regulation network. Besides, the databases are constructed based on the overlaps between SEs and (a) the neighborhood of promoter regions of target genes, (b) TF binding sites or genes encoding for TFs and (c) the neighborhood of the transcription start site of miRNAs [
49,
50]. SEs are usually identified using ROSE or HOMER algorithm from different publicly available ChIP-Seq data [
48] and integrated with other information based on the above approaches. Although these databases provide information within the transcriptional regulatory regions but are unable to explore the joint effect of these regulators on gene expression. As these databases are built on multiple databases gathering specific information, computational methods might leverage insight from them to develop robust methods and/or validate the known findings.
In this review, we first summarize our current understanding of the computational method unravelling gene regulation network in connection to miRNAs and TFs obtained from ChIP-based experiments. We also outline the necessity of incorporating SEs, another ChIP-Seq derived information, into the network and formulate open problems that highlights computationally motivating research areas. We broadly divide the research areas in three categories: (1) Unravelling indirect network of gene regulation, (2) Identification of network motifs to increase the precision target genes, and (3) Enriched pathway identification by jointly dissecting genes regulators. In the first category, we review computational methods or pipelines that explore effect of miRNAs on gene regulation through indirect networks. Some methods [
51] accumulate information from databases on TFs associated target genes and miRNAs and combine them to predict genes that miRNAs target via TFs, while others explore the agreement between different data types and iteratively refine predictions [
52]. We highlight that similar idea can be explored in the realm of SEs as well because they interact with miRNAs to deregulate genes, and multiple databases are available to provide specific information about SEs. In the second category, we explore methods identifying network motifs through which miRNAs impact target genes. Most of the work in this area deals with finding auto-regulatory loops involving TFs. Although experimentally SEs have been identified to form similar loops, but such connections remain untapped computationally. Moreover, presence of interactions identified in auto-regulatory loop-specific databases [
50] in connection to SEs emphasize the importance of systemic identification of such network motifs. The third category highlights leveraging SEs to identify important pathways. We speculate that this will provide substantial insight in tissue-specific functions. Although this research area is well established in connection to TFs, miRNAs and target genes, SEs remained neglected largely despite of its potential in gene regulation network. Although experimental studies and databases have identified association of SEs in gene expression network, these methods are either very specific (as in experimental studies) or too generalized (as identified in databases) often measured across multiple studies/tissues/cell lines. But appropriate computational methods for systematic identification of the regulators in gene regulatory network is crucial. However, existing context-specific databases (summarized in
Table 1) might be leveraged for developing such methods, validation of the findings through experimental studies would be ideal. But validation from existing publications also provides evidence of the efficacy of the developed methods.
In the next sections, we provide three computationally motivating and challenging areas of research related to unravelling some of the well-known players (viz. miRNAs, TFs) of gene regulatory networks, along with some recently popular ones like SEs (in
Figure 1). With the existing methods available in each area (summarized in
Table 2), we illustrate the potential insights and challenges that would be gained by incorporating SEs and provide motivational open problems for understanding gene regulation network a little better. The emerging experimental studies illustrating gene regulation in context of SE association (summarized in
Table 3) provides a compelling reason to dissect SEs via computational methods to have improved understanding of gene regulation network.
5. Discussion
miRNAs and TFs have been extensively studied in connection to gene regulation. Aberration in these gene regulators have been implicated in various diseases. In this review, we highlight the role of ChIP-Seq data that potentially generates much more information other than TF, to unravel gene regulation network under miRNA control. Recent experimental studies have illustrated the synergistic role of miRNAs and TFs on enhancers and SEs in gene regulation network, particularly in cancer [
81]. SEs are not only observed near genes with cell-type specific functions, but they are also considered sensitive to alteration in chromatin-based mechanism of gene regulation [
10]. For example, in Burkitt’s lymphoma, strong enhancers come in close proximity of oncogenes via genomic rearrangements such as translocation. But compared to normal enhancers, SEs are enriched in sequence motifs corresponding to cell-type specific master TFs. SEs have been implicated to drive the biogenesis of miRNAs that are crucial for cell identity via enhancement of both transcription and Drosha/DGCR8-mediated primary miRNA (pri-miRNA) processing. CRISPR/Cas9 genomics revealed that SEs facilitate Drosha/DGCR8 recruitment and pri-miRNA processing that boosts cell-specific miRNA production [
45]. Tissue-specific and evolutionarily conserved atlas of miRNA expression and function are observed to be largely shaped by SEs and broad H3K4me3 domains identified by ChIP-Seq. Both well studied cancer-related miRNAs [
82] and other miRNAs with SE alterations have been associated with cancer hallmarks. Moreover, targeting SE components such as disruption of SE structure or inhibition of SE cofactors have attracted therapeutic options across various cancers [
77].
It is well known that a single miRNA can target multiple genes and mediate their post-translational regulations. But accurately identifying these target genes experimentally and computationally is still a challenge. Numerous databases are constructed based on curated experimental gene targets and/or computationally predicted targets. But the little overlap between these databases indicates presence of false positive predictions. The reason behind this discrepancy might be due to the lack of appropriate method available to combine more players involved in the regulatory network. With the advent of NGS technology, we have access to a huge collection of high-throughput data. But it is still challenging to understand the complexity in the gene regulation. Computationally TFs and miRNAs are extensively studied in relation to gene expression regulation because of their better biological understanding. But SEs have not been explored much in parallel to TFs and miRNAs, despite of experimental validation of its involvement in gene regulation network. Although, SEs could be identified from the Chip-Seq experiments like TFs, identification of SEs is not straightforward [
83]. Oftentimes, putative enhancers in close genomic proximity are termed as SEs. Although the definition of SEs is not very clear, a few databases such as HOMER [
13], ROSE [
6,
14], EnhancerDB [
50] etc. are now available that provides information on SEs and enhancers.
Here, we aim to provide the status of the existing computational method that unfold the gene regulatory network in connection to SE, TF and miRNA jointly. We also bring forth the gap in the knowledge in three different areas as open problems for the scientific community to consider. In connection to gene expression regulation, we first explore networks that dissects the role of miRNA, TFs and gene expressions collected from different databases using integrative computational approaches. A number of these methods are constructed based on iteratively updating information of one data source using different data sources. Other methods use multiple databases and integrate partial information to get an overall picture of predicted associations. But none of these methods have explored SEs in this connection, despite of being experimentally implicated in aberration of gene expression. Secondly, different network motifs or biochemical wiring patterns have been studied relating to genes and its regulators such as, TFs and miRNAs. Computationally identifying these patterns might provide a greater insight in the gene regulatory network. Motifs such as FFL and others aids in understanding the actual direction of the flow of information. But identifying motifs in connection with SEs or enhancers have been largely overlooked. But presence of databases such as EnhFFL, that curate motifs in connection to enhancers/SEs emphasizes the importance of such information to scientific community. But these FFLs are created based on physical overlaps between either enhancer-TF-miRNA, enhancer-TF-gene, enhancer-TF, or enhancer-miRNA-gene across several tissue/cell lines in human and mouse. Lastly, we discuss about the importance of identifying pathways through enrichment analysis. Since gene regulatory network is quite complex with multiple genes being target of one or more miRNAs and vice-versa, it is imperative to focus collectively on genes having similar functions and their regulators. Moreover, since chromatin folding plays a huge role in gene regulation we emphasize in dissecting information from ChIP-Seq experiments to better understand the effect of gene regulators in close-knit networks or pathways. The availability of SE databases such as SEanalysis 2.0 curated from multiple Chip-Seq data could mark an important piece of information. Although, computational methods [
74] for identifying pathways associated with miRNAs and TFs exists, we are yet to develop similar approaches involving SEs that might provide substantial insight for gene regulation network.
In conclusion, while significant progress has been made in understanding gene regulation networks involving TFs and miRNAs, the integration of SEs into computational analyses remains an open research area. Developing computational methods to incorporate SEs into regulatory network predictions will enhance our understanding of gene regulation mechanisms and facilitate the identification of functionally relevant pathways in health and disease.