1. Introduction
Gestational Diabetes Mellitus (GDM) refers to the initial onset of impaired glucose intolerance during pregnancy. The global estimated prevalence of GDM varies 6.1% to 15.2% [
1]. There is a potential negative correlation between GDM and the consequences of mother and offspring. GDM mothers are at an increased risk of developing gestational hypertension, infection, polyhydramnios and cesarean delivery [
2]. GDM infants are at a risk of developing macrosomia, congenital heart disease, erythrocytosis and hypoglycemia [
3,
4]. In addition, Mothers with GDM have an high risk of developing obesity, type 2 diabetes mellitus (T2DM) and cardiovascular diseases alongside their children in the future [
5,
6,
7]. Therefore, it is crucial from a clinical and health perspective to investigate the development of GDM, enabling early screening, accurate diagnosis and timely intervention to protect the wellbing of both the mother and the fetus. While knowledge concerning the detailed mechanisms underlying the initiation and progression of GDM remains unknown and is still a key obstacle on the road of GDM treatment. The development of stable and accurate biomarkers will greatly facilitate the early detection GDM related biological features. Therefore, it is urgent to identify more additional potential biomarkers for GDM.
Senescence refers to the durable cell cycle arrest triggered by stress in cells that were previously replication-competent cells [
8]. And senescent cells exhibit a series of characteristics including enduring cessation of growth, upregulation of anti-proliferative genes and initiation of injury sensing signaling pathways resulting in the expression of various senescence-related transcripts [
9]. Previous studies showed that human umbilical cord mesenchymal stromal cells (hUC-MSCs) exhibit reduced cell growth, accelerated cellular senescence and diminished differentiation potential in GDM subjects [
10]. Human umbilical cord endothelial cells (HUVECs) affected by gestational diabetes also display on early endothelial senescence [
11]. In senescence, the immune system weakens in its ability to respond against pathogens and hyperglycemic environment. This decline in immune function is known as immune-senescence and involves alterations in the ratio of naïve to memory T cell ratio, CD4 to CD8, compromised calcium-dependent signaling and thymic atrophy. However, immune-related senescence (IRS) biomarkers have not been studied by bioinformatics analysis.
After gene microarray and high-throughput sequencing technologies were developed, gene expression profiling became a crucial method for investigating disease mechanisms and identifying important diagnostic and treatment biomarkers [
12,
13]. Additionally, machine learning, a branch of information science, involves training computers to perform tasks by identifying patterns in vast datasets and using them to establish rules or algorithms that enhance task achievement [
14]. Studies indicate that machine learning is advantageous for identifying predictive biomarkers and diagnosing GDM [
15,
16,
17,
18].
In our study, we explore the diagnosis values of IRS genes in GDM via bioinformatics methods and machine learning. Firstly, we conducted weighted gene co-expression network analysis (WGCNA) exploring gene expression profiles in GDM samples as well as control samples. WGCNA takes into correlations among transcripts to identify potential IRS genes and co-expression modules, and also explores the associations between these modules and disease’s traits, as well as within-modular relationships. Secondly, we pinpointed gene modules closely related to the advancement of GDM and screened potential biomarkers through a combination with univariate logistic regression, least absolute shrinkage and selection operator (LASSO) algorithm analyses and protein–protein interaction (PPI) network. Then, we detected linked pathways through single sample gene set enrichment analysis (ssGSEA). Additionally, unsupervised cluster analysis was conducted in GDM patients to explore immune infiltration. Finally, we identified these genes through a real-time quantitative polymerase chain reaction (RT-qPCR) and western blot in placental tissues to verified the expression of potential genes, providing new perspectives for the identification and management of GDM.
2. Results
2.1. Establishment of the Immune and Aging Related Biomarker Signature
The study procedure was presented in Figure 1. Immune and Aging score of GDM and Control group were calculated by ssGSEA algorithm and those scores of GDM were significantly higher than those of Controls (Figure 2A,B). Then, genes were divided into different clusters based on those scores (Figure 2C), and the value of β was established as 19 to make the interplay between genes approach the scale-free distribution (Figure 2D). Next, clusters were further merged into different modules which including at least 300 genes, and we obtained 7 modules eventually (Figure 2E,F). Finally, after a correlation analysis between these modules and scores, we found that the MEblack module including 2297 genes has the highest correlation with the immune and aging score (Figure 2G,H).
Figure 1. The workflow of current study. GSE: gene expression omnibus series; GDM: gestational diabetes mellitus; WGCNA: weighted gene co-expression network analysis; ssGSEA: single sample gene set enrichment analysis; DEGs: differentially expressed genes; PPI: protein–protein interaction; IRS: immune-related senescence; GO: gene ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; LASSO: Least absolute shrinkage and selection operator; TF: transcription factor; GSEA: gene set enrichment analysis; ROC: receiver operating characteristic.
Figure 2. Identification the immune and aging related genes. The immune score (A) and aging score (B) sets by ssGSEA. (C) Determine soft‐threshold power for WGCNA: Analysis of the scale‐free fit index for various soft‐threshold powers (Left). Analysis of the mean connectivity for various soft‐threshold powers (Right). When the soft domain value is set to 19, data becomes stable and is suitable for WGCNA. (D) Clustering tree based on modules’ eigengenes. (E) Merging of modules. The minimum number of genes for modules was 300, deepSplit = 3, and cutHight =0.4. Finally, 7 non-grey modules were obtained. (F) Heatmap of correlation between ME and GDM modules. In black module, there was a strong positive correlation between module membership and Immune score (G) or Aging score (H) (cor = 0.29&p<0.001, cor = 0.51&p<0.001). ssGSEA: single sample gene set enrichment analysis; WGCNA: weighted gene co-expression network analysis; GDM: gestational diabetes mellitus
2.2. Screening of Differentially Expressed Genes
A total of 467 DEGs between GDM and Control samples were obtained by differential analysis, including 144 up-regulated and 323 down-regulated genes (Figure 3A) (
Table S3). The heatmap showed the expression of all the DEGs. And we found the genes expression pattern of GDM was distinguished from that of Control (Figure 3B). These DEGs were intersected with 2297 genes in the MEblack module to obtain 32 potential genes (Figure 3C). Then, we found that potential genes were enriched in 14 GO-Biological process (GO-BP) functions including regulation of neuron death, proton transmembrane transport, etc. 2 GO-Cellular components (GO-CC) functions containing inner mitochondrial membrane protein complex and mitochondrial inner membrane, 5 GO-Molecular functions (GO-MF) consisting of GDP binding, solute:cation symporter activity, etc. (Figure 4A,B), and 6 KEGG pathway containing Oxidative phosphorylation, Autophagy - animal, etc (Figure 4C, D). In addition, a PPI network containing 18 nodes and 15 edges was generated. Of note,
SLC25A3 has a correlation with 4 other genes containing
COX17,
SLC20A1,
NDUFA6, and
UBE2M (Figure 4E).
Figure 3. The volcano plot and the hierarchical clustering heatmap. (A) The volcano plot. The red plots represent up-regulated probes, the blue plots represent down-regulated probes, and the gray plots represent probes with no significant differences. (B) The hierarchical clustering heatmap. The blue classification bar on the top represents GDM samples, whereas the red bar represents normal controls. The color bar on the right represents the expression value. For a gene, red color means high, whereas blue means low. (C) Venn graph for DEGs. GDM: gestational diabetes mellitus; DEGs: differentially expressed genes
Figure 4. Enrichment analyses. The significant cellular processes and signaling pathways were demonstrated by GO (A, B) and KEGG (C, D) analyses. The y-axis was the name of cellular processes or signaling pathways, and the x-axis was the number of genes. The size of the bubble indicates the gene count, while colors indicate the significance of enrichment. (E) Protein–protein interaction (PPI) network of target genes. GO: gene ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; PPI: protein–protein interaction
2.3. Univariate Logistic Regression and LASSO Algorithm Analyses
Our findings led us to perform univariate logistic regression model. The results revealed that 23 IRS genes, including 20 protection factors and 3 risk factors, had potential prognostic value (Figure 5A). After further filtering by the LASSO algorithm, 5 of these genes (COX17, RRAS2, RRAGC, CECR1, and TACC2) remained and defined as core IRS genes (Figure 5B,C). Based on the Area under the curve (AUC) of ROC curve derived from the training set and validation set (both greater than 0.7), the prognostic prediction performance of biomarkers was verified preferable (Figure 5D,E). Subsequently, a nomogram was used to forecast the prognosis of GDM (Figure 5F). The result and subsequent calibration curve showed a preferable prognostic prediction performance (Figure 5G). Finally, we generated a decision curve and found a significant benefit (Figure 5H).
Figure 5. Univariate logistic and LASSO regression model detection. (A) Univariate logistic regression model shows the 23 potential genes. (B, C) LASSO regression model further screened the potential genes (COX17, RRAS2, RRAGC, CECR1, and TACC2). ROC curves for evaluating the diagnostic ability of the remained genes in the training set (GSE70493) (D) and validation set (GSE103552) (E). (F) A nomogram model to validate the impact of the 5 feature biomarkers on diagnostic effectiveness. (G) Calibration curve for the predictive power of the nomogram model. (H) DCA for the nomogram model. LASSO: Least absolute shrinkage and selection operator; ROC: receiver operating characteristic; GSE: gene expression omnibus series; DCA: decision curve analysis.
2.4. Functional Enrichment Analyses
We used GSEA and GO/KEGG analyses to investigate the potential biological roles of the 5 core genes. Results of ssGSEA identified that
CECR1 was enriched in 3091 GO functions including cytokine production involved in immune response, cell activation related to immune response, etc. and 171 KEGG pathways containing Chemokine signaling pathway, mRNA surveillance pathway, etc. (Figure 6A,
Figure S1A).
COX17 was enriched in 2519 GO functions such as triggering immune response, regulation of immune effector process, etc. and 147 KEGG pathways including Autophagy - animal, Proteasome, etc. (Figure 6B,
Figure S1B). And
RRAGC was enriched in 2945 GO functions including activation of innate immune response, humoral immune response, etc., and 190 KEGG pathways containing cell cycle, Ribosome, etc. (Figure 6C,
Figure S1C). Besides,
RRAS2 was correlated with 3002 GO functions such as activation of immune response, regulation of immune effector process, etc., and 165 KEGG pathways containing Calcium signaling pathway, Nucleocytoplasmic transport, etc. (Figure 6D,
Figure S1D).
TACC2 was associated with 2499 GO functions including activation of innate immune response, humoral immune response, etc. (Figure 6E,
Figure S1E). The total enrichment information was provided in
Tables S4–S13. In addition, the network governance diagram generated by IPA demonstrated that
CECR1, which was also named after
ADA2, was correlated with
TP63 and
IFNG, and
COX17 was correlated with
BCR and
AFM (Figure 6F).
Figure 6. Single gene enrichment analyses. ssGSEA and GO analysis the latent biological roles of CECR1 (A), COX17 (B), RRAGC (C), RRAS2 (D) and TACC2 (E). (F) IPA shows the corresponding genes associated with CECR1 and COX17. ssGSEA: single sample gene set enrichment analysis; GO: gene ontology; IPA: ingenuity pathway analysis.
2.5. Construction Regulatory Network
TFs could regulate the transcription of genes, we constructed a TF-mRNA network and found that
RRAGC was regulated by 7 TFs (
YY1,
SMAD4,
ZNF692,
SMAD2,
BACH2,
ZNF343, and
HBP1),
TACC2 was regulated by 5 TFs (
AFF1,
AFF4,
PBRM1,
ARNT2, and
TBX3),
RRAS2 was regulated by 3 TFs (
PRDM1,
ZNF468, and
GABPA),
CECR1 was regulated by
IRF8 and
PLEK, and
COX17 was regulated by
GABPB1 (
Figure S2A). Then, 18 DE-miRNAs, which including 10 up-regulated miRNAs and 8 down-regulated miRNAs, between GDM and Control patients were obtained to be intersected with biomarkers-related miRNAs detected by the software “mirwalk” (
Figure S2B) (
Table S14). There were 2 intersection miRNAs between down-regulated mirwalk miRNAs and up-regulated DE-miRNAs and 1 intersection miRNA between up-regulated mirwalk miRNAs and down-regulated DE-miRNAs obtained eventually to construct an mRNA-miRNA network (
Figure S2C,D). Notably,
CECR1 was regulated by
hsa-miR-134-5p and
hsa-miR-195-5p, and
RRAGC was regulated by
hsa-miR-188-5p (
Figure S2E). Besides, a biomarkers-kinases network was generated and we found that 4 kinases (
RAF1,
ARAF,
ACVR1, and
BMPR1B) regulated the
RRAGC, 2 kinases (
TTK and
AURKC) regulated the
TACC2, and kinase
TRIM24 regulated the
COX17 (
Figure S2F). Finally, we merged these three networks into one and found that only
RRAGC was regulated by 7 TFs (
YY1,
SMAD4,
ZNF692,
SMAD2,
BACH2,
ZNF343, and
HBP1), 4 kinases (
RAF1,
ARAF,
ACVR1, and
BMPR1B), and
hsa-miR-188-5p at the same time (
Figure S2G).
2.6. Recognition Subtypes and Immune Infiltration in GDM
Using the five diagnostic biomarkers, we conducted consensus cluster analysis for GDM samples categorized into different subtypes. The heatmap indicated an ideal classification of GDM samples at K = 2, with 12 samples in Cluster A and 20 samples in Cluster B in training set (Figure 7A-C). The expression of TACC2 exhibited higher in Cluster B (Figure 7D). The PCA score plot revealed clear clusters for GDM patients (Figure 7E). The CIBERSORT algorithm was employed to conduct the proportion of different infiltrating immune cell types between Cluster A and B. After excluding populations with a total immune abundance value of 0, the Wilcox test algorithm was utilized for 16 immune cell populations, including naïve B cells, plasma cells, CD8+ T cells, naïve CD4+ T cells, follicular helper T cells, regulatory T cells (Tregs), resting NK cells, activated NK cells, monocytes, M0 macrophages, M1 macrophages, M2 macrophages, activated DCs, resting mast cells, activated mast cells and Neutrophils (Figure 7F). Heatmap showed the relationship between immune cells (Figure 7G). CECR1 was positive with Neutrophils and negative with resting NK cells (Figure 7H–I). RRAGC was positive with CD8+ T cells and negative with naïve CD4+ T cells (Figure 7J–K). TACC2 was negative with Dendritic cells activated (Figure 7L). We also performed consensus cluster analysis in validation set. The results were similar and shown in Supplementary Figure 3. Complexheatmap showed the immune infiltration landscape between cluster A and B in training set (Figure 8A) and validation set (Figure 8B).
Figure 7. Recognition subtypes in GDM based on the five core genes and immune infiltration in GSE70493 datasets. (A-C) Clustering matrix plot at k = 2 via unsupervised clustering analysis; (D) Five core genes expression in the two clusters. (E) PCA of GDM clusters. (F) Bar plot showed percentage infiltration of 22 immune cells in clusters. (G) Correlation heatmap of 22 immune cell types. CECR1 was significantly correlated with Neutrophils (H) and NK cells resting (I). RRAGC was significantly correlated with T cells CD4 naive (J) and T cells CD8 (K). TACC2 was significantly correlated with Dendritic cells activated. GDM: gestational diabetes mellitus; GSE: gene expression omnibus series; PCA: principal component analysis.
Figure 8. Distribution immune cell infiltration in the two clusters. Complexheatmap shows immune infiltration landscape between the two subtypes assessed using Cibersort, Estimate, xCell, MCPcounter, and ssGSEA in GSE70493 datasate (A) and GSE103552 (B). MCP: microenvironment cell population; ssGSEA: single sample gene set enrichment analysis; GSE: gene expression omnibus series.
2.7. Functional Analysis of Clusters
A total of 1711 DEGs between cluster A and B in GDM were obtained by differential analysis, including 515 up-regulated and 1196 down-regulated genes (
Table S15). GO and KEGG analyses were utilized to identify the biological efficacy of DEGs using the Metascape database. The analysis indicated that the DEGs were mainly linked to Translation, Parkinson disease, ribonucleoprotein complex biogenesis, mitochondrion organization, Protein processing in endoplasmic reticulum and so on (
Figure S4). The top 20 clusters with their representative enriched terms were listed in
Table S16.
2.8. Validation in Clinical Tissue
To further confirm the 5 core genes, we performed a RT-PCR in 6 GDM and 6 control placental samples. We found that CECR1 was down-regulated, COX17, RRAS2, and TACC2 was up-regulated in GDM samples in comparison with controls (Figure 9A). The primer sequence was shown in Table 1. And we also performed western blotting, while only CECR1 and TACC2 left (Figure 9B). The results offered the evidence of therapeutic value of CECT1 and TACC2 in GDM management.
Figure 9. Validation in clinical tissue. Identification hub genes in the placenta via the RT-qPCR(A) and western blot (B). RT-qPCR: real-time quantitative polymerase chain reaction.
3. Discussion
Gestational Diabetes Mellitus (GDM) is a complex and multi-factorial metabolic disorder, while characterized by glucose intolerance with initial recognition during pregnancy [
32,
33]. During pregnancy, GDM is related with increasing risks of adverse outcomes, including stillbirth, preterm birth, large for gestational age, pre-eclampsia and neonatal hyperinsulinaemia [
34,
35,
36]. Women with a history of GDM have been reported to have elevated risks of developing T2DM, chronic kidney disease, metabolic syndrome, cardiovascular and cerebrovascular diseases in the future [
6,
37,
38,
39,
40]. The pathogenesis of GDM involves a combination of genetic, environmental, and physiological factors [
2,
41,
42]. Recent studies showed that aging and immune-related mechanisms have a crucial influence on the development of GDM. Senescence due to hyperglycemia could disrupt the function of human umbilical cord mesenchymal stromal cells (hUC-MSCs) and endothelial cells, and might cause abnormal differentiation of embryo, thereby predisposing them to the development of GDM [
10,
43]. Furthermore, immune-related mechanisms, such as alterations in the maternal immune response and inflammatory processes, have been implicated in the pathogenesis of GDM [
21,
44,
45]. However, the mechanism of IRS in GDM remains unclear. Understanding the interplay between senescence- and immune-related mechanisms is critical for elucidating the pathogenesis of GDM, facilitating prevention and targeting interventions to mitigate the risk and complications associated with GDM.
Current researches have emphasized the effect of abnormal metabolism and immunometabolism, which involving glucose, lipid and energy metabolism, in the progress of GDM [
2,
46]. Metabolic regulation influences the immunological condition of GDM by modifying the immune microenvironment [
47,
48]. In this study, we utilized WGCNA to identify modules most correlated with IRS genes in GDM, which might be pivotal in the development and progression of GDM. Through the intersection of differentially expressed genes and module genes obtained via WGCNA, we derived 32 of the most crucial IRS genes. Prognostic models were constructed for these 32 genes using univariate logistic regression and Lasso regression. Subsequent analysis involving nomogram and ROC curve indicated that the five core genes exhibited significant clinical application value. Further, consensus cluster analysis divided GDM samples into two clusters, allowing for the identification of immune infiltration landscape and DEGs between the groups. This sheds light on the heterogeneity of IRS in GDM. Lastly, validation of key genes expression in clinical samples suggests a potential biomarker for GDM.
We used multiple bioinformatics analysis to explore the differential expression of IRS genes between the normal and GDM group. GO enrichment analysis revealed significant enrichment of DE-D genes in the biological process related to neuron death, proton transmembrane transport and mitochondrial transmembrane transport. Furthermore, the KEGG signaling pathways involved in thermogenesis, autophagy and oxidative phosphorylation might facilitate the involvement of DE-D genes to the development of GDM. Brown and beige adipocytes have the capacity to catabolize stored energy into heat, and this thermogenic ability could be harnessed as a treatment for metabolic disorders such as obesity and T2DM. Thermogenic adipocytes generate heat by coordinating the supply of substrate with mitochondrial oxidative machinery and controlling the speed of substrate oxidation [
49]. Various studies have reported associations between thermogenesis and biological process, such as insulin sensitivity, energy metabolism and thyroid disease [
50,
51,
52]. In addition, autophagy is a constantly changing mechanism that breaks down and reuses cellular organelles and proteins in order to uphold the stability of the cell. GDM patients have an increased risk of neonatal infection and fetal pancreas damage due to inflammation and autophagy [
53,
54,
55].
Based on the machine learning algorithms, CECR1, COX17, RRAGC, RRAS2, and TACC2 were recognized as five diagnostic biomarkers. The ROC analysis indicated considerable diagnostic significance of these five biomarkers for GDM. Eventually, only CECR1 and TACC2 were left after RT-PCR and western blotting examinations.
The genetic locus
CECR1 on chromosome 22q11.2 codes for a protein consisting of 511 amino acids that exhibit adenosine deaminase functionality, with reduced binding capabilities for adenosine and 2’-deoxyadenosine compared to
ADA1.
ADA2, an intracellular enzyme produced by the
ADA gene located on chromosome 20q13.11, is thought to facilitate the breakdown of extracellular adenosine and potentially serve as a stimulant for macrophage growth and differentiation. Deficiency in
ADA2 inheritance has been associated with a diverse syndrome characterized by autoinflammatory, vasculopathy and immune manifestations [
56,
57,
58,
59]. Differentiation of macrophages and abnormalities of inflammation and immune have been reported in GDM and senescence related diseases [
60,
61,
62]. While
CECR1 has never been studied in GDM or other obstetric related diseases.
CECR1 might be involved in the pathogenesis of GDM through its effects on inflammation or immune function, and the specific mechanism needs further study.
TACC2 is a member of transforming acidic coiled-coil proteins (TACCs), which are recognized as centrosomes/microtubules interaction-associated proteins that contain a highly conserved C-terminal coiled-coil “TACC domain” [
63].
TACC2 has been implicated in regulating microtubule dynamics, a process crucial for a wide range of cellular activities, including cell migration, cytoskeletal organization, and intracellular transport [
64,
65,
66]. And mounting evidence suggests that aberrant expression of
TACC2 participated in the progression of some carcinomas, such as prostate cancer [
67], hepatocellular carcinoma [
68], breast cancer [
65] and infant acute lymphoblastic leukemia [
69].
TACC2 has never reported in GDM. And further extensive researches are need to identify their significance in GDM. Our results indicated numerous potential functions of elements
CECR1 and
TACC2 in the progression of GDM.
To further characterize the core IRS genes in GDM, we performed consensus cluster analysis and divided GDM samples into two clusters. Analysis of GO and KEGG pathways showed that the DEGs in the two clusters were linked to processes such as translation, mitochondrion organization and endoplasmic reticulum. Afterward, the CIBERSORT algorithm revealed a marked increase in the presence of regulatory T cells, plasma cells, monocytes, macrophages M0, M1, and M2 in cluster 2. Macrophages, as key components of the immune system, exhibit a high degree of plasticity. When exposed to inflammatory stimuli, macrophages could differentiate into M1 and M2 phenotypes, which displaying proinflammatory and anti-inflammatory actions, leading to a dual immune response to GDM [
70,
71,
72]. Then we detected immune infiltration landscape using five assessments. These findings shed light on the immunological aspects in GDM.
The study has certain limitations. First, while the biomarkers were validated in a single centre, their broader clinical utility could be better understood through a prospective, largescale, and multi-center clinical trial. Furthermore, the specific roles of CECR1 and TACC2 in GDM remain unknown and further experimental exploration is warranted to elucidate their biological functions and underlying mechanisms in GDM patients.
Conclusively, our research has pinpointed CECR1 and TACC2 as pivotal biomarkers linked to immune response and senescence through employing machine learning algorithm and clinical validation. These findings offer a novel perspective on the development of GDM.
4. Materials and Methods
4.1. Data Collection
RNA sequencing data (RNA-seq) was downloaded from Gene Expression Omnibus (GEO) cohort (
https://www.ncbi.nlm.nih.gov/geo/). GSE70493 microarray including 31 Control and 32 GDM samples of placental tissue was used as a training set, and GSE103552 microarray consisting of 17 Control and 20 GDM samples of placental endothelial cells was used as a validation set. Besides, 1793 Immune-related genes (
Table S1) were sourced from the immunology database and analysis portal (ImmPort) database (
https://www.immport.org/shared/home) [
19]. A total of 307 Aging-related genes (
Table S2) were acquired from the CellAge database (
https://genomics.senescence.info/cells/) [
20].
4.2. Weighted Gene Co-Expression Network Analysis (WGCNA)
In accordance with the expression of 1793 Immune-related genes and 307 Aging-related genes, ssGSEA was utilized to count the immune score and aging score of Control and GDM samples by the R package “GSVA” (version 1.42.0) [
21]. WGCNA was conducted to obtain genes with a strong correlation with immune and aging based on these scores. Firstly, the “goodSamplesGenes” function in the R package “WGCNA” (version 1.70-3) was used to estimate whether there were outlier genes need to be eliminated [
22]. Then, the value of the soft threshold value (β) was verified to make sure the mutual interaction between genes as much as possible approach the scale-free distribution. Next, a dynamic tree cut algorithm was used to generate modules and merge them. Finally, we conducted a correlation analysis between modules and immune score or Aging score.
4.3. Screening and Analyses of Potential Genes of GDM
Differentially expressed genes (DEGs) between Control and GDM specimens in GSE70493 microarray were obtained by the R package “limma” (version 3.44.3) with
p < 0.05 and |log2FC| > 0.1[
23]. These DEGs were intersected with genes in the module which screened out by WGCNA previously to obtain IRS genes. Then, the R package “clusterProfiler” (version 4.2.2) with a
P value < 0.05 was used to perform enrichment analysis of potential genes based on the Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database [
24]. Finally, we build a PPI network with minimum required interaction score of 0.7 through the STRING website (
https://cn.string-db.org/) to find out whether there was reciprocity among potential genes.
4.4. Sifting and Validation of Biomarkers
A univariate logistic regression model was constructed and then we performed LASSO algorithm based on the model to obtain biomarkers. The performance of these biomarkers in predicting prognosis was assessed by Receiver operating characteristic (ROC) analysis based on the training set and the validation set. Next, feature genes were used to generate a nomogram by the “lrm” function in the R package “rms” [
25] (version 6.2-0), and a calibration curve was created at the same time to verify the prediction performance of the nomogram. Finally, a decision curve was plotted to detect the net benefit rate of biomarkers.
4.5. Function Enrichment Analyses of Biomarkers
Gene set enrichment analysis (GSEA) of biomarkers was conducted by the R package “clusterProfiler” based on the GO and the KEGG database, and we visualized the top 10 GO functions or KEGG pathways with the highest correlation with biomarkers [
24]. Besides, Ingenuity pathway analysis (IPA) was conducted to find upstream and downstream regulators regulated by biomarkers.
4.6. Construction of a Regulatory Network of Biomarkers
The expression data of 1267 transcription factors (TFs) in humans were obtained from the AnimalTFDB3.0 database (
http://bioinfo.life.hust.edu.cn/AnimalTFDB#!) [
26]. The Pearson correlation coefficient between TFs and biomarkers was computed by the R package “psych” (version 2.1.9). TF-mRNA pairs with |cor| ≥ 0.7 and
p < 0.05 were selected to construct a TF-mRNA network by the software “cytoscape” (version 3.8.2). Then, differentially expressed miRNAs (DE-miRNAs) (
p < 0.05 and |log2FC| > 0.5) between Control and GDM samples in GSE104297 microarray were obtained by the R package “DESeq2” (version 1.34.0) [
27]. Biomarkers-related miRNA was detected by the software “mirwalk” (version 3.0), and these were intersected with DE-miRNAs to retain correlation pairs with the opposite trend to generate an mRNA-miRNA network [
28]. Besides, upstream kinases of biomarkers were detected by the “X2Kgui” tool (version 1.6.1207) to generate a biomarkers-kinases network [
29]. Finally, the TF-mRNA network was merged with the mRNA-miRNA network and the biomarkers-kinases network to construct a total regulatory network.
4.7. Evaluation of Immune Cells Infiltration in GDM
Based on the diagnostic genes, consensus cluster analysis was conducted to identify the immune modulation patterns of in GDM patients using the R package “ConsensusClusterPlus”. The ideal number of clusters was determined via the cumulative distribution function (CDF), consensus matrix, and comparative change in area under the CDF curve. Then we assessed the immune infiltration landscape between the subtypes via CIBERSORT, estimate, xcell, MCPcounter and ssGSEA. And we also explored the relationships between the core genes and immune cells.
4.8. RT-qPCR
Our experiments received approval from the Ethics Committee of the Affiliated Hospital of North Sichuan Medical College (2023ER223-1). All participants consented in writing to join this study. Total RNA from placental specimens was extracted using the TRIzol reagent (Invitrogen, USA) from 6 GDM and 6 control patients. cDNA was synthesized using Prime Script RT ReagentKit (Takara, China), and qPCR were carried out using the SYBR Premix ExTaq kit (Takara,China). GAPDH was selected as the internal reference gene, and the 2−ΔΔCt method was utilized to standardize the gene expression levels. The primer details is displayed in Table 1.
4.9. Western Blotting Analysis
The placenta samples were lysed using RIPA lysis buffer (Beyotime, China). And we used a BCA protein assay kit (Solarbio, China) for quantifying proteins. Then the proteins were transferred onto nitrocellulose membranes (Millipore, USA) following their separation on SDS-PAGE. TBS was used to wash the membranes 5 times and 5% skimmed milk was used to block the proteins for 1 hour. Next, primary antibodies were used to incubate the membranes overnight at 4°C. Antibodies utilized in present study: GAPDH (dilution: 1:5000, 60004-1-lg, Proteintech, China), CECR1 (dilution: 1:1000, 11299-1-AP, Proteintech, China), TACC2 (dilution: 1:1000, 11407-1-AP, Proteintech, China), P16 (dilution: 1:1000, 10883-1-AP, Proteintech, China), P21 (dilution: 1:1000, 10355-1-AP, Proteintech, China). Subsequently, the membranes were incubated with secondary anti-rabbit IgG antibodies for 1 hour at room temperature and observed via ECL luminescence.
4.10. Statistical Analysis
R software (version 4.1.1) and GraphPad Prism 9 were used for statistical analyses. Box plots and volcano plots were created via the R package “ggplot2” (version 3.3.2) [
30]. Venn plots and scatter plots were created by the R package “ggpubr” (version 0.4.0). The heatmaps were plotted by the R package “Heatmap” (version 4.1.0). ROC curves were constructed by the R package “pROC” (version 1.17.0.1) [
31]. Statistical significance between groups were defined as
p < 0.05 (*
p < 0.05, **
p < 0.01, and ***
p < 0.001).