3.1. Identification of the Three Subtypes Based on Sample Omics Data
The violin plots depicting the top ten most significantly differentially expressed mRNAs are presented in
Figure 1(A). It is of particular interest that elevated expression levels of
CLDN3 in gastric cancer influence tumor cell permeability, facilitating their traversal across the basement membrane and extracellular matrix, thus potentially contributing to oncogenesis[
34,
35].
CDH3, a gene predominantly overexpressed in gastric cancer, is associated with cancer invasion and metastasis. The protein it encodes facilitates the proliferation and mobility of cancer cells[
36].
Figure 1.
Significantly different characteristics between cancer and normal samples: (A) top ten most significant mRNAs; (B) most significant differential miRNAs; (C) differential boxplots of 22 immune cells.
Figure 1.
Significantly different characteristics between cancer and normal samples: (A) top ten most significant mRNAs; (B) most significant differential miRNAs; (C) differential boxplots of 22 immune cells.
Figure 1(B) illustrates the general up-regulation of
hsa-miR-21-5p, a microRNA (miRNA) that expressed in a variety of cancers, especially in colorectal cancer, in which
hsa-miR-21-5p contributes to tumor development and progression of tumors through the modulation of multiple biological processes, such as apoptosis and inflammatory responses[
37].
Box plots of immune cell percentages for the two samples showed considerable variation in the proportion of certain immune cells, leading to the selection of data from 12 specific immune cells for further analysis.
Following four fusion iterations of the four data models, convergence was successfully attained. The iterative process, as depicted in
Figure S2, illustrates the progressive convergence of the four similarity matrices across iterations. Subsequently, spectral clustering was employed to confine the class count (k) within the range of [
2,
5], utilizing the elbow method to ascertain optimal k. In total, the study classified 515 samples into three subtypes, comprising 212, 132, and 171 samples, respectively. To more vividly exhibit the characteristic differences among the subtypes, the top twenty differential features of mRNA and RNA, along with the comprehensive data of immune cells, were filtered using the Kruskal-Wallis test.
Figures 2(A) and (C) illustrate the expression profiles of samples across different subtypes under the top twenty DEGS, with (C) presenting the mean expression values for samples within each subtype. In these mRNA expressions, samples from subtype I exhibit relatively high levels, while samples from subtype II show comparatively lower levels, and samples from subtype III have the lowest expression. Figures 2(B) and (D) depict the expression of the first twenty differential miRNAs among samples of various subtypes, with (D) also presenting the average expression values for samples of each subtype. For the initial nine miRNA features, the expression levels across the three subtypes are presented in descending order, in alignment with the mRNA results. However, for the last eleven miRNA features, the expression outcomes are nearly inversely related. Corresponding box plots for the first six features are provided in Figure 3, revealing significant differences among subtypes, thereby preliminarily confirming the validity of the cancer typing methodology followed in this study.
Figure 2.
Heatmaps of the top 20 significantly differentially expressed genes(DEGs): (A) mRNA expression heatmap; (B) miRNA expression heatmap; (C) mRNA expression heatmap after taking the mean for the corresponding feature of the same subtype sample; (D) miRNA expression heatmap after taking the mean for the corresponding feature of the same subtype sample.
Figure 2.
Heatmaps of the top 20 significantly differentially expressed genes(DEGs): (A) mRNA expression heatmap; (B) miRNA expression heatmap; (C) mRNA expression heatmap after taking the mean for the corresponding feature of the same subtype sample; (D) miRNA expression heatmap after taking the mean for the corresponding feature of the same subtype sample.
Figure 3.
Six features with the most significant differences among the three subtypes: (A) top six most significantly differentially expressed mRNAs; (B) top six most significantly differentially expressed miRNAs.
Figure 3.
Six features with the most significant differences among the three subtypes: (A) top six most significantly differentially expressed mRNAs; (B) top six most significantly differentially expressed miRNAs.
Figure 4 illustrates the expression patterns of various subtypes of immune cells, focusing on the first 11 classes identified through p-value testing. In subtype I, specific immune cells such as resting NK cells and plasma cells exhibit reduced activity (blue), whereas M2 macrophages and regulatory T cells (Tregs) display heightened activity (red). Subtype II presents a pattern distinct from that of subtype I, and subtype III diverges from subtype I in the activity of most cell types, with M1 and M2 macrophages demonstrating moderate to high activity, and resting NK cells and Tregs showing lower activity. As depicted in Figure 4(B), subtype I is characterized by a higher proportion of M2 macrophages and Tregs, which may indicate a more potent anti-inflammatory or immunomodulatory role. Subtype II is marked by an increased presence of CD8 T cells and plasma cells, potentially linked to a more robust immune response or antibody production. In contrast, subtype III exhibits a higher proportion of M1 and M2 macrophages, which could be associated with tissue repair and modulation of the tumor microenvironment.
Figure 4.
Immune cell characteristics of different subtypes: (A) Heat map of immune cell characteristics of samples of the same subtype after taking the mean value; (B) Difference in the percentage of immune cells in samples of different subtypes.
Figure 4.
Immune cell characteristics of different subtypes: (A) Heat map of immune cell characteristics of samples of the same subtype after taking the mean value; (B) Difference in the percentage of immune cells in samples of different subtypes.
3.2. Identification of Hub Genes of Different Subtypes by WGCNA
Weighted Gene Co-expression Network Analysis (WGCNA) was conducted on mRNA data based on the samples from three subtypes, encompassing a total of 515 samples and 5844 genes. Initially, during the analysis, 5000 genes with the highest variability were selected. Using the histogram algorithm, the soft-thresholding power β=3 was identified, thereby achieving an R² value of 0.88 and an average connectivity below 100 to meet our criteria (Figure S3). It can be observed in Figure 5(A) that 1) most mRNAs exhibit low connectivity, 2) only a minority demonstrates high connectivity, and 3) the constructed network exhibits scale-free properties. The dissTOM matrix was constructed by leveraging the topological overlap matrix (TOM) similarities to quantify gene expression dissimilarities. This matrix forms the foundation for clustering and subsequent module identification, and the “cutreeDynamic” algorithm from the WGCNA package was employed for dynamic pruning to discern 16 modules encompassing all genes. Module sizes ranged from a minimum of 33 genes to a maximum of 1747 genes with only 25 genes included in the gray module. The gray module was excluded from subsequent analyses, and the number of genes in each module is detailed in Table S1.
Figure 5.
Results of weighted gene co-expression network analysis (WGCNA):(A) Scale-Free Topology Analysis, frequency distribution of the number of connections (i.e., node degree, k) in the network (left), and a test of the scale-independent nature of the network (right);(B) Clustering of Module Eigengenes;(C) Gene Dendrogram and Module Colors and (D) Module Eigengene Correlation Heatmap.
Figure 5.
Results of weighted gene co-expression network analysis (WGCNA):(A) Scale-Free Topology Analysis, frequency distribution of the number of connections (i.e., node degree, k) in the network (left), and a test of the scale-independent nature of the network (right);(B) Clustering of Module Eigengenes;(C) Gene Dendrogram and Module Colors and (D) Module Eigengene Correlation Heatmap.
Utilizing the characteristics of Module Eigengenes (MEs), the correlation between individual genes and their corresponding modules can be precisely quantified. This correlation coefficient serves as a pivotal metric for assessing whether a gene functions as a hub gene within its module.
Figure 5(B) presents the correlation clustering tree dendrograms for the 15 identified modules, indicating a similarity threshold below 0.7 (with merge heights exceeding 0.3), thereby avoiding dynamic pruning. To pinpoint the key modules with the most robust correlations to sample traits, notably tumor subtypes, we assessed the gene module-sample subtype correlations (
Figure 6). In this evaluation, categorical labels were assigned a value of 1, with non-relevant categories receiving 0. Employing a distinctive heat coding strategy for labels, we conducted three separate analyses to determine the most pertinent central genes for each category. Pearson correlation coefficients were employed to gauge the relationship between feature genes, i.e., those linked to sample characteristics and the categorical variables denoting tumor subtypes. This methodology resulted in quantifying the correlation between tumor subtypes and feature genes across modules. The “Turquoise,” “Brown,” and “Black” modules demonstrated the most pronounced correlation with the three tumor subtypes. Within these pivotal modules, we initially determined the intramodular connectivity (kWithin) for each gene, reflecting the strength of its interaction with other genes within the same module, as calculated using the following formula:
where,
is the Pearson correlation coefficient within the module for the two genes.
Figure 7.
Correlation of different modules with different subtypes.
Figure 7.
Correlation of different modules with different subtypes.
The greater a gene’s intramodular connectivity, the more pivotal its role within the module and the more closely it interacts with other genes. We assessed each gene’s correlation with tumor subtypes (GS) and its agreement with the module’s signature genes (MM). Hub genes are characterized by high GS, high MM, and high within. Accordingly, for subtype I samples, genes were selected with GS>0.45 and |MM|>0.8; for subtype II samples, with GS>0.2 and |MM|>0.7; and for subtype III samples, with GS>0.4 and |MM|>0.8. Thereafter, by ranking genes based on kWithin in descending order, the study identified 16 hub genes for Subtype I, 9 for Subtype II, and 8 for Subtype III.
Table 1 presents an exhaustive compilation of hub genes for the three sample subtypes. Among the hub genes of subtype I,
MAGI2-AS3 facilitates the progression of gastric cancer by sequestering miR-141/200a, thereby sustaining the overexpression of
ZEB1[
38], an epitranscription factor that plays a role in the regulation of epithelial-mesenchymal transition (EMT), a pivotal process in cancer metastasis and invasion.
MAGI2-AS3, through its interaction with miRNAs, is implicated in the modulation of the tumor microenvironment, impacting tumor cell proliferation, migration, and invasion. Concurrently,
MAGI2-AS3 advances the progression of colorectal cancer by manipulating the miR-3163/
TMEM106B axis. It functions as a molecular sponge for miR-3163, inhibiting the suppressive effect of miR-3163 on
TMEM106B, which results in the upregulation of
TMEM106B expression and consequently fuels tumor cell proliferation and migration[
39].
Table 1.
Hub genes for the three subtypes (sorted according to Kwithin).
Table 1.
Hub genes for the three subtypes (sorted according to Kwithin).
Subtype I |
kWithin |
Subtype II |
kWithin |
Subtype III |
kWithin |
MAGI2-AS3 |
371.7129 |
RP11-416A17.6 |
55.3853 |
SPARC |
34.4698 |
TTC28 |
345.3234 |
RP11-166B2.3 |
52.2037 |
FAP |
33.0935 |
RBMS3 |
345.2273 |
RP11-192H23.7 |
50.8855 |
BGN |
29.6813 |
CNRIP1 |
338.6266 |
MALAT1 |
46.6874 |
SULF1 |
29.6049 |
PLEKHO1 |
323.7333 |
RP11-49O14.2 |
46.2525 |
CDH11 |
28.0017 |
GYPC |
315.0139 |
CTD-2014D20.1 |
46.0766 |
PRRX1 |
26.9047 |
C20orf194 |
313.7970 |
LA16c-431H6.6 |
45.3105 |
THY1 |
26.4728 |
CLIP4 |
312.4037 |
NPIPB5 |
40.0239 |
NOX4 |
25.9135 |
FOXN3 |
309.4977 |
RYKP1 |
39.8201 |
|
|
ATP8B2 |
300.8144 |
|
|
|
|
RP11-875O11.1 |
286.9392 |
|
|
|
|
PDE1A |
254.0221 |
|
|
|
|
NR3C1 |
249.1351 |
|
|
|
|
SLC9A9 |
248.7150 |
|
|
|
|
NR2F2-AS1 |
245.9516 |
|
|
|
|
RP11-730A19.9 |
226.7302 |
|
|
|
|
Among the hub genes of subtype II,
MALAT1 has been identified as intimately linked to the development, progression, and metastasis of various human cancers. It exhibits elevated expression in colorectal cancer tissues, contributing to the enhanced growth of SW480 and HCT116 colorectal cancer cells[
40,
41]. Additionally,
MALAT1 is deeply implicated in gastric carcinogenesis through diverse molecular pathways. For instance, it augments the proliferation of gastric cancer cells by downregulating the expression of miRNAs such as miR-122, miR-1297, miR-22-3p, and miR-202, and by repressing the activity of the oncogene
PCDH10, thereby promoting the growth and invasiveness of gastric cancer[
42].
Within the hub genes of subtype III,
SPARC was shown to amplify the chemosensitivity of 5-FU by facilitating apoptosis. Our findings indicate that both cleaved
PARP and cleaved caspase-3 levels were increased after overexpression of
SPARC protein. Additionally,
Bax, a pivotal protein in the apoptotic process, was significantly upregulated in SGC-7901 and BGC-823 cells with heightened
SPARC expression. These outcomes implicate that
SPARC may induce apoptosis in gastric cancer through the activation of the PARP/caspase-3 pathway[
43]. Expression levels of the
SPARC gene are notably correlated with clinical attributes of colorectal cancer, such as tumor stage, suggesting its potential as a biomarker for colorectal cancer[
44].
3.3. Impact of Hub Genes on the Development of Gastrointestinal Tumors
Gene Ontology (GO), established by the Gene Ontology Consortium, serves as a comprehensive database that catalogs the functional roles of genes and their transcriptional and translational products within biological processes. Our analysis of the pathways involving hub genes in gastrointestinal cancer subtypes aims to delineate their biological functions and the pathways in which they participate.
Figure 7(A) illustrates the outcomes of GO analysis for subtype I with a primary focus on the Molecular Function (MF) category of GO. In this representation, dots correspond to distinct biological processes, and the magnitude of each dot is proportional to the gene count associated with the process. In addition, the color gradient reflects the adjusted p-value (p.adjust), denoting the statistical significance of enrichment. Cannabinoid receptors contribute to various intestinal physiological processes, including peristalsis, secretion, and epithelial barrier function. Research indicates that the deletion of cannabinoid receptor 1 can result in intestinal inflammation and cancer[
45]. Activated cannabinoid receptors, notably CB1 and CB2, are recognized for their role in modulating inflammatory responses and tumor cell proliferation[
46]. Glucocorticoid receptors are pivotal in regulating immune responses, inflammation, and cell survival. In the context of gastrointestinal cancers, glucocorticoids may modulate the tumor microenvironment via their receptors, potentially impacting tumor growth and metastasis by suppressing inflammation and regulating immune cell activity. Specifically, in colorectal cancer, glucocorticoids might facilitate cancer cell proliferation and invasion through the GR-CDK1 signaling pathway[
47].
Figure 7.
GO analysis results: (A) subtype I (B) subtype III.
Figure 7.
GO analysis results: (A) subtype I (B) subtype III.
Figure 8(B) shows the results of GO analysis for subtype III, focusing mainly on the GO category ‘Biological Process (BP)’. Most of these functions are associated with extracellular matrix (ECM) interactions, cell migration, and tissue development. Engagement with the extracellular matrix is an essential component of the tumor microenvironment in gastrointestinal cancers, influencing the invasive and migratory capacities of tumor cells. The interaction between tumor cells and the ECM has the potential to either advance or retard tumor progression[
48].
Genes like MAGI2 that encodes long non-coding RNA MAGI2-AS3 and RBMS3 that encodes the RNA Binding Motif Single Stranded Interacting protein3 participate in a spectrum of regulatory processes, encompassing signal transduction, gene expression modulation, and intercellular communication. This participation may indicate that subtype I is particularly dynamic in the realms of cellular signaling and gene expression regulation. Genes within subtype II might be more engaged in specialized regulatory roles, such as the involvement of non-coding RNA in transcriptional regulation, and could be pivotal in specific physiological or pathological contexts, including the modulation of gene expression in response to environmental stresses or disease conditions. Genes implicated in extracellular matrix interactions and tissue remodeling, such as SPARC and FAP, frequently contribute to tissue development, repair, and the cancer microenvironment. The roles of these genes suggest that subtype III may be instrumental in governing extracellular matrix dynamics and adaptations under pathological conditions.
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.