As previously discussed, technological and computational advances in recent decades have allowed the identification of thousands of LncRNAs whose molecular alterations are associated with different types of cancer. A major challenge is that initial efforts to discover cancer-related LncRNAs took advantage of classical functional genomics approaches primarily by characterizing the global transcriptomic landscape, evolutionary conservation, or proximity to known cancer genes, and while these analytical strategies provided valuable insights, altered transcriptional profiles alone do not indicate a causal role in cancer programs. At present, different biological evidence describing well-known cancer-associated LncRNAs have been presented in the literature, most of them focused on the function of a single LncRNA linked to malignant transformation through its roles in gene regulation and its impact on cancer hallmarks.
In this section, we will review the progress made in understanding multi-omics biological features through machine learning and artificial intelligence approaches and provide an overview of novel advances in revealing the regulatory bases underlying the functionalities of LncRNAs through various molecular and cellular biological levels.
7.1. Describing new LncRNAs drivers in cancer through multi-omic integration
Apart from some limited and well-known LncRNAs, the landscape of cancer LncRNAs is far from being complete. The diverse functional repertoire of LncRNAs in cancer can be explored by: 1) their function as driver genes resulting from early mutations that are positively selected during tumorigenesis; or 2) as downstream genes, resulting from non-genetic changes in expression, localization, or molecular interactions [
104]. Although both categories contribute to cancer phenotypes, most efforts to discover cancer LncRNAs only take advantage of differential expression approaches. To overcome this, several computational methods have recently been developed to identify LncRNA driver genes by analyzing coordinate omic alterations to detect signals of positive selection (
Figure 1, bottom panel).
One of the first statistical methods for driver-gene discovery was OncodriveFML [
105], which identifies tumor-related LncRNAs by interrogating somatic mutations of coding and non-coding regions and gene expression. Compared to other methods, OncodriveFML calculates a functional impact score for which it uses a local mutational background, in specific regions to define positive selection signals in genes across tumor tissues. The method considers that the mutational background is influenced by the chromatin architecture, the replication timing and the transcription factor binding sites and, therefore, considering this local background, as well as the mutational and expression patterns, this computational tool can discover LncRNAs contained in potential genomic driver regions involved in tumorigenesis. Calculation of this method on whole-genome tumor data sequenced by TCGA and Cancer Genome Project made it possible to identify MALAT and MIAT as two LncRNAs that exhibit an excess of high-impact mutations.
Another integrative method to predict LncRNA drivers relevant in tumorigenesis is ExInAtor [
106] that identifies genes with high somatic single nucleotide variants load across tumor genomes using DNA mutational patterns (local trinucleotide background model) and expression data as proxies for functionality. ExInAtor was computed over 1112 entire genomes from 23 cancer types deposited in GENCODE and predicted 15 high-confidence LncRNAs drivers, from which 9 are novel LncRNAs and 6 known cancer-related transcripts, including PCA3, MALAT1, BCAR4, LncRNA-ATB and SAMMSON. Most of the above mentioned LncRNAS were found to be tumor-specific, although NEAT1 and MALAT1, were identified in a Pan Cancer context, reaffirming their roles in tumorigenesis. The set of none previously reported driver LncRNAs includes: MIR100HG, AP000469.2, RP11-308N19.1, RP11-455B3.1, RP11-332J15.1, RP11-707A18.1, RP11-6c14.1, RP11-1101K5.1, RP11-354A14.1, RP11-189E14.4 (fig 2b). These novel candidates are evolutionary conserved, expressed in normal tissue and present elevated gene length. They also tend to be proximal to cancer SNPs and are encoded in CNV regions, pointing to their role in tumorigenesis. The authors highlight MIR100HG that is highly conserved, present canonical histone modifications in the promoter region and transcription factors binding sites.
More recently, to enhance the discovery of cancer-related LncRNAs and gain insights into their biology, the Cancer LncRNA Census (CLC) was presented as a tool to provide functional or genetic evidence of LncRNAs roles in cancer by the integration of genomic and transcriptomic data linked to cancer in different mammals species [
104]. Until now, it is not completely clear whether mutated LncRNAs can drive tumorigenesis and whether such altered functions could be conserved during evolution. Therefore, CLC considers as a relevant feature the conserved functions between humans and mice that could be strong evidence for the biological role of the LncRNAs, both in cancer and under physiological conditions. Application of this computational model revealed the colocalization of cancer LncRNAs with known protein-coding cancer genes, 10 tumor-causing mutations were identified in 8 LncRNA orthologs, including DLEU2, GAS5, MONC, NEAT1, PINT, PVT1, SLNCR1, XIS some of them already reported in cancer.
The integration of DNA, RNA, and protein alterations and the way they cooperatively interact are providing new evidence for the relevance of LncRNA dysregulation in cancer, and therefore uncharacterized LncRNA models have been developed to define cancer cues. For instance, LongHorn [
107], a recently presented computational method that integrates genomic, transcriptome and proteomic alterations and predicts LncRNA regulatory networks dysregulated in cancer pathways, by modeling their impact on transcription factors, RNA-binding proteins, and microRNAs activity, LncRNA-promoter binding sites and post-transcriptional activation/inihibition. Computing this method for 14 cancer types from TCGA predicts multiple LncRNA candidates whose dysregulation affect other known cancer genes and pathways, mainly in tumor specific context and influence tumor etiology. OIP5-AS1, TUG1, NEAT1, MALAT1, XIST, and TSIX, were inferred to regulate cancer signaling in multiple tumor contexts. Additionally, analyses of the LncRNA networks pointed to the enrichment of LncRNA binding sites in the promoter regions of messenger RNAs, enhancing the transcriptional effects of the LncRNAs. Functional experimental analyses in breast and gynecologic cancer cells showed that knock-down of OIP5-AS downregulated PTEN, boosting cell proliferation. Similarly, WT1-AS silencing downregulated predicted targets including: BCOR, FOXO4, PBX1, WT1 and ZEB1 in ovarian cell lines models, while TUG1 exogenous downmodulation negatively regulates CELF1, CSF1, FGFR2, NRAS, PDGFC and SOS1. The above characterization confirmed most of the predictions performed with LongHorn.
An interesting approach presented by Mitra and collaborators to predict biological dependencies of uncharacterized LncRNAs is centered in the identification of co-essential modules through the integration of copy number, epigenetic, and transcriptomic data of LncRNA landscape of exogenous knockouts or activation screens established with CRISPR approaches [
108]. Applying this model to multi-omic cancer cells lines data the authors estimated 289 LncRNA-gene co-expression networks that recapitulates known proliferation-regulating LncRNAs and predicts novel LncRNAs related with proliferative signaling that are poorly characterized such as PSLR-1/2 that induce a G2 arrest throughout the modulation of FOXM1 transcriptional network and their exogenous expression inhibits proliferation and colony formation in in-vitro models.
Although DNA methylation dysregulation is associated with cancer, the molecular mechanisms of how methylation and transcriptional LncRNA patterns are reciprocally modulated in cancer remains largely unknown. A novel integrative analysis framework, called MeLncTRN (Methylation mediated LncRNA Transcriptional Regulatory Network), integrates transcriptome, DNA methylome and copy number variation profiles, to identify the regulatory circuits directed by epigenetically-driven LncRNAs across 18 cancer types. Analysis of 5970 TCGA tumor samples defined that the association between LncRNAs and DNA methylation mechanisms is common and conserved across multiple cancer types, for instance a complex interplay between LncRNAs and epigenetic modulators such as the DNA cytosine methyltransferases DNMT1, and histone modification proteins, such as EZH2, occurred. For example, FAM83H-AS1, TUG1, PVT1, and LINC00511 act as scaffolds to enhance EZH2 or DNMT1 binding and consequently repress the expression of their mRNA targets [
109]. This observation expands the understanding of LncRNAs roles in the transcriptional regulation circuits in addition to their miRNA sponge activity as a competitive endogenous RNA (ceRNA) [
110].
Emerging evidence has also indicated the underlying crosstalk between LncRNA and genomic instability, a relevant hallmark of cancer. Novel approaches integrating chip-seq, WGS and WES data revealed an unexpected relationship between oncogenic LncRNAs and epigenetic alterations that contributes to chromosome fragility in cancer. To characterize the LncRNA-based mechanism by which aberrant epigenetic signatures can be generated, the authors used as a conceptual proxy the subtelomeric chromosome locus 8q24 that contains the cMYC gene and a large histone H3 variant (CENP-A) domain, both altered in cancer cells of diverse solid tumors. This region also encodes for 5 unique LncRNAs sequences, namely, PCAT1, PCAT2, CCAT1, CCAT2, and PVT1, that negatively modulated the occupancy of CENP-A at the chromosomal locus. Their results stated a competition between LncRNAs transcription and R-loop occupancy that highly contributes to the maintenance of CENP-A invasion that ultimately impairs chromosome stability [
111].
One oncological milestone already mentioned is TMB, related to the infiltration of diverse immune cell populations that can enhance or limit cancer programs [
112]. Recently, increasing evidence has revealed that LncRNAs can play fundamental roles in the regulation of the immune system, but only few immune-related LncRNAs have been described in cancer. Therefore, novel approaches to shed light have been developed [
113,
114]. Through an integrative analysis of the LncRNA expression, tumor immune response signatures and genome-wide DNA methylation data in 9,626 tumor samples across 32 cancer types, the tool lincRNA-based immune response (LIMER) [
115] revealed 7528 lincRNAs associated with tumor immune signature. Of interest, EPIC1 was detected as a relevant immune-related LncRNA inversely correlated with MHC expression and CD8+ T activation and infiltration. In-vitro and in-vivo models demonstrated that EPIC induces tumor immune evasion and resistance to immunotherapy by the epigenetic suppression of tumor cell antigen presentation through EZH2 interaction. Another interesting tool is ImmLnc [
113], which systematically infers candidate LncRNA modulators of immune-related pathways by matching gene and LncRNA expression profiles. The tool prioritizes cancer-related LncRNAs by comprehensively characterizing LncRNA landscape and its correlation with the immunome. One of the first insights is that tumors originating from similar tissues are likely to share LncRNA immune regulators. Further, novel subtypes identified through ImmLnc show distinct mutation burden, immune-cell infiltration, expression of immunomodulatory genes, response to chemotherapy, and prognosis.