1. Introduction
Cancer is one of the most globally prevalent diseases currently, affecting millions of individuals worldwide. It is primarily characterized by the abnormal and uncontrolled growth and division of cells, which results in the creation of aggregates of cells known as tumors, in affected tissues. Tumorous cells are capable of proliferating irrepressibly and subsequently migrating into neighboring tissue to affect healthy cells, and inducing similar actions in them, in a condition known as malignancy; following which fully developed tumors invade other tissues via circulatory and lymphatic systems, causing metastasis [
1]. The principal mechanism causing initial tumor formation is the disruption of equilibrium in expression between two classes of genes: Proto-oncogenes, which are positive regulators of DNA replication and cell division, and tumor-suppressor genes which negatively regulate these processes in cells containing any defects or putative tumor-affected cells, thereby preventing them from undergoing division. The action of various external carcinogenic factors, such as chromosomal translocations, genetic mutations and amplifications, or epigenetic changes [
2]; causes the activation and transformation of proto-oncogenes into oncogenes displaying abnormally higher expression. This is typically accompanied by the downregulation of tumor-suppressor genes, leading to the loss of homeostatic balance in affected cells and commencing in their unrestrained cell division and resultant tumor formation and growth facilitated by angiogenesis, while inhibiting normal cell apoptosis, thus promoting the growth and proliferation of tumors. Fully metastasized tumors, which represent later stages of cancer and during which symptoms are most apparent, are highly resistant to regular treatment methods. This causes cancer to have a very low survival rate, as well as an unequal global distribution, with higher prevalence in countries with low socio-economic development, mainly contributed to by a lack of access to adequate diagnostic and treatment facilities.
One of the most well-known oncogenes in cancer is SPP1 (secreted phosphoprotein 1, which encodes the protein Osteopontin, a non-collagenous bone matrix protein belonging to the Small Integrin-Binding Ligand N-linked Glycoprotein (SIBLING) family of glycoproteins, which plays an extensive role in several stages of tumor development [
3], wherein its increased secretion has been found to be correlated with higher risks of malignancy in various forms of cancer [
4]. It binds to the plasma membrane through its signaling receptors CD44 & αvβ3 integrin and has a diverse range of cellular functionalities. SPP1 is involved in the development of the conducive tumor microenvironment (TME) during initial tumor formation through angiogenesis and upregulation of various cellular processes including cell proliferation, adhesion, migration and inter-cellular communication [
5]. Epithelial-mesenchymal transition (EMT) has been found to be mediated by SPP1, especially in its type 3 form, through the upregulation of transcription factors such as TWIST [
6,
7], SNAIL1 [
7] and SNAIL2 [
7], as well as activation of pathways such as PI3K-AKT-TWIST and hypoxia-inducible factor-1 alpha (HIF-1α) in various cancer models. It has been found to promote endothelial cell selection and migration during tumor angiogenesis [
8,
9]. Another of its mechanisms of action in cancer includes the negative regulation of T-cell activation through the overexpression of inhibitory molecules such as PD-L1, which results in T-cell exhaustion and apoptosis, causing immunosuppression in the tumor microenvironment and promoting tumor growth [
10,
11]. Finally, it has also been found to play an important role in the development of tumor resistance to treatment by chemotherapy and radiotherapy through the regulation of several pathways such as autophagy, PI3K/Akt signaling, MAPK signaling and EGFR pathways [
12]. All these properties of SPP1 make it an interesting target with a large range of purposes, including as a useful biomarker for diagnosis of tumor aggressiveness and disease stage and survival in affected individuals [
4,
13,
14]; as well as the development of therapeutic approaches involving SPP1 inhibition to suppress tumor growth, proliferation, and metastasis in patients.
The perturbation of proto-oncogenes and their conversion into oncogenes, as well as their subsequent role in cancer progression, is typically governed by the activity of a variety of biological pathways involving them, which principally enable their functionality. The positive regulation of several tumorigenic pathways associated with cell cycle progression, cell proliferation and cell apoptosis, is brought about through the overexpression and genetic alterations occurring in specific ‘driver’ genes, with many of them being co-occurrent in multiple forms of cancer [
15]. Of note, SPP1 is found to make use of a large repertoire of biological pathways for the same, during which it is highly correlated with a set of specific other oncogenes in tumorous cells [
7]. Gaining further insight into their pathway enrichment profiles is therefore crucial in order to investigate the precise mechanisms of oncogenic activity in disease progression within human tissues.
Additionally, the functionality of genes is also determined by the expression profiles of their constituent ‘transcripts’, or isoforms created by splicing patterns. Due to the high variability of the RNA splicing process, there exists the possibility of a gene having multiple splice variants or protein isoforms, occurring due to selective inclusions and deletions in exonic regions within each isoform, causing them to vary in sequence composition and length after undergoing translation to proteins. As a result, there can be a differential expression between spliced isoforms for the same gene, which ultimately influences their functionality as well as factors such as stability, localization within the genome and liability to transcriptional regulation [
16]. Oncogenic activity in cancer has also been found to be associated with abnormalities in the regulation of alternative splicing, forming larger numbers of differentially expressed transcripts and thereby driving higher protein diversity and functionality in cancer progression [
17], as well as the development of resistance to treatment modalities such as chemotherapy and radiotherapy [
18]. A primary factor driving these variations in splicing events is differential exon usage occurring during the splicing process, which is observed in 90-95% of all genes with multiple transcripts [
19]. Therefore, analyzing DEU patterns in oncogenes occurring during tumorigenesis can potentially provide greater insights into their functions and roles in cancer progression.
In this study, we have conducted a detailed investigation of the extensive role of SPP1 in four particular types of cancer in humans: Breast, Prostate, Renal and Skin; by analyzing its expression profile in these tissues during tumor occurrence and progression, its effects on the enrichment of related cellular pathways, as well as the examining differences in the expression of its splicing variants as a potential cause of modifications in its functionality during tumorigenesis.
2. Materials and Methods
Bulk transcriptomic (paired-end total RNA-Seq) data was collected from the Gene Expression Omnibus (GEO) repositories of 17 previous cancer studies in the aforementioned four types of tissues. Sequenced data was obtained in the form of sequence read archive (SRA) files, which were then processed using a transcriptomic analysis pipeline (
Figure 1), using a set of specific bioinformatic tools. SRA files were converted into the FASTQ format using the NCBI tool ‘fasterq-dump’ (v2.9.1) [
20], followed by trimming using Trimmomatic (v0.39) [
21] in order to remove adapter sequences and filter out reads having a Phred quality score (Q) below 30 and lengths shorter than 50 base pairs. Quality scores were verified using FASTQC (v0.11.8) [
22], which showed all reads to have good quality (Q > 30). Reads were aligned using Hisat2 (v2.1.0) [
23] to the human genome (Hg38 Ensembl v97) in paired-end unstranded mode, following which uniquely mapped reads were quantified for gene & transcript features in the human genome using Subread featureCounts (v1.6.4) [
24].
Read counts obtained were normalized using the TMM method and subjected to differential gene expression (DEG) using the R package ‘edgeR’ [
25]. Genes having uniformly low expression across samples were removed using the default filtering method in edgeR (count ≥ 10 in at least 15 samples). Specific blacklisted genes such as intronic genes, pseudogenes and non-coding RNA, whose expression might correspond to background noise arising due to biological variation, were also excluded from count matrices. Samples were classified into groups based on conditions in their study of origin, representing tumor occurrence, grade, disease stage, treatment modalities or other physiological factors. Count matrices were fit onto generalized linear models and comparisons between categorized sample groups were performed using QLF (quasi-likelihood F) tests, obtaining genes showing dysregulation in either direction. Highly significant genes were selected using cutoff values of absolute FC > 1.5 along with nominal p-value < 0.05. Gene set enrichment analyses (GSEA v4.0) [
26] were conducted with the normalized counts for determining pathways (represented by gene signatures) showing enrichment in each sample group; using the c2 (canonical pathways) and c6 (oncogenic signatures) gene set families obtained from the MSigDB database [
27]. Since SPP1 has been found to interact with specific other oncogenes during cancer progression, we also used a set of six highly correlated genes found in a breast cancer study (FAP, ACTA2, TWIST1, ITGB1, S100A4 and CXCL12) [
7], and looked for enrichment of MSigDB gene sets which involved SPP1 and at least one among these interacting genes. Significantly enriched pathways were determined using cutoff values of absolute NES (normalized enrichment score) > 1.4 along with nominal p-value < 0.05. Lists of the top 100 significantly dysregulated genes obtained in each DEG comparison per study, which showed patterns of co-expression with SPP1, were selected for analysis using Metascape (v3.5.20240101) [
28] in order to identify the top 20 functionally enriched pathways by their Gene Ontology (GO) terms, as well as the interactions between them listed in order of significance. Differential splicing analyses were performed using the R package ‘DEXSEQ’ [
29], in which reads were quantified for exonic regions within Ensembl transcripts per gene and compared using a negative binomial likelihood ratio test (LRT) between the categorized DEG groups, to examine differences in selective utilization of regions between the constituent transcripts per gene. Regions with significant differential exon usage (DEU) were quantified using a threshold of Benjamini-Hochberg (FDR) corrected p-value of less than 0.05.
3. Results
A total of 390 samples were processed successfully using the described pipeline (
Figure 1), for 4 types of cancer-affected human tissue (
Table 1). Quality control metrics displayed an average of 90% reads retained after filtering for low-quality bases, out of which around 75% reads were uniquely mapped to the Hg38 genome. Assignment of reads showed more than 70% mapped reads matching genomic features. Normalized and filtered count matrices obtained were grouped according to sample metadata for breast cancer (
Table 2), prostate cancer (
Table 3), renal cancer (
Table 4) and skin cancer (
Table 5) for downstream comparisons. During the final qualitative filtering for undesirable genes and those with uniformly low expression, about 75-80% genes with retained in the count matrix.
In the breast cancer study GSE213474 [
30], MCF7 cells treated with IFN-γ stimulation showed SPP1 to be significantly downregulated in samples (log2FC = -2.067, p-value = 0.038), when the treatment was applied for a longer duration (48 hours v/s 24 hours) (
Figure 2A). When analyzing PC-3 cell samples in the prostate cancer study GSE193127 [
38] which were transfected with siRNAs targeting FOXA1, a known prostate oncogene, SPP1 was expressed significantly higher (log2FC = 4.005, p-value = 1.658e-04), as compared to control samples which were transfected with non-targeting siRNA (
Figure 2B). Prostate PC-3 cells in the study GSE179990 [
39] showing an overexpression for nuclear transcription factor NF-YA, in both its long and short alternatively spliced isoforms also displayed higher SPP1 levels; with the shorter isoform ‘NF-YAs’ or ‘NFYAv2’, of length 318bp, having greater significance (log2FC = 3.673, p-value = 6.501e-04) (
Figure 2C), than the longer isoform ‘NF-YAl’ or ‘NFYAv1’, of length 347bp (log2FC = 3.578, p-value = 7.989e-04) (
Figure 2D).
When comparing renal cell carcinoma (RCC) tissue samples from the dataset GSE151419 [
42] with increasing Fuhrman grades, there was a significantly large difference observed in SPP1 expression between grade 4 and grade 2 cancer-affected samples (log2FC = 1.590, p-value = 0.046) (
Figure 2E). Finally, it was showed an upregulation in a skin cancer dataset (GSE84293) [
46], within samples affected by cutaneous squamous cell carcinoma (cuSCC) (log2FC = 2.603, p-value = 0.022), as compared to samples affected by actinic keratosis (AK) (
Figure 2F), which is a benign condition characterized by skin lesions, and may act as a precursor to cancer.
Our GSEA analyses confirmed patterns of significant enrichment for significant pathways (gene signatures) in samples showing high dysregulation in SPP1 and other oncogenes, concordantly as found in DEG analyses. PC-3 cells in the prostate cancer dataset GSE179990 [
39] showed greater overall enrichment in samples expressing the shorter isoform (318 bp long) of NF-YA, than its longer isoform (347 bp long) (
Figure 3A), with the top signature being ‘KRAS.600_UP.V1_UP’ (NES = 2.049, p-value < 0.001) (
Figure 3B). In renal cancer, tumorous samples from RCC tissue in the dataset GSE167573 [
40] enriched a larger number of signatures than controls (
Figure 3C), with the highest being ‘HOXA9_DN.V1_UP’ (NES = 3.138, p-value < 0.001) (
Figure 3D); while samples affected by IgA nephropathy in GSE141295 [
41] also showed more enrichment than unaffected controls (
Figure 3C), with the top signature being ‘TBK1.DF_DN’ (NES = 2.524, p-value < 0.001) (
Figure 3E). In another renal cancer dataset, GSE151419 [
42], samples showed a general surge in number of enriched signatures with higher tumor grades, from grade 2 to grade 4 (
Figure 3F). The signature ‘CSR_LATE_UP.V1_UP’ was highly enriched at grade 4 (NES = 2.506, p-value < 0.001) (
Figure 3G).
Metascape analyses using top significant DEGs expressed unidirectionally with SPP1, and its interacting partner genes also showed similar results in pathway involvement. In the breast cancer dataset GSE213474 [
30], many pathways related to cell growth and division to be involved in genes that had shown a significant downregulation with the application of IFN-γ treatment (
Figure 4A), including integrin cell surface interactions, a significant hallmark of SPP1 action. Prostate cancer samples in GSE179990 [
39], with high expression for NFYA (both its long and short isoforms), showed similar pathways to be present within highly upregulated genes (
Figure 4B and
Figure 4C, respectively). In renal cancer, tumorous samples in GSE167573 [
40] displayed a large number of cellular developmental pathways (
Figure 4D), including extracellular matrix organization, another key feature of SPP1 functionality. Samples in GSE151419 [
41] affected by renal tumors of higher grades also had similar effects, with several pathways within genes higher in Grade 4 RCC being related to regulation of cellular processes (
Figure 4E). Finally, similar effects were seen in the skin cancer dataset GSE113113 [
44,
45], within genes expressed higher in SCC samples (
Figure 4F) as compared to samples with actinic keratosis (AK).
In differential exon usage (DEU) analyses, significant changes in selectivity were observed for SPP1 and many of its interacting partner genes, in conditions consistent with DEG and pathway analysis results. In the breast cancer study GSE213474 [
30], the second set of MCF7 cell samples, which were supplemented with heat-inactivated FBS (HD) along with estradiol (E2), showed 3 exonic regions (E017, E018 and E019) of CXCL12 or ‘ENSG00000107562’, to differ significantly with longer time under treatment using IFN-γ stimulation (
Figure 5A). In the prostate cancer dataset GSE179990 [
39], PC-3 cells overexpressing NF-YA (both its short and long isoforms) showed 1 exon (E006) to be changed across transcripts of S100A4, or ‘ENSG00000196514’ (
Figure 5B). Cells from renal cell carcinoma (RCC) tissue in the dataset GSE167573 [
40] showed tumorous samples to have multiple exonic regions in several genes, among them SPP1 or ‘ENSG00000118785’ with 5 exons (
Figure 5C), S100A4 with 2 exons (
Figure 5D) and CXCL12 with 5 exons (
Figure 5E). SPP1 was also observed to have 10 exons significantly different in IgA nephropathy-affected samples from RCC tissue in the dataset GSE141295 [
41] (
Figure 5F) compared to normal cortex samples.
4. Discussion
The activation of proto-oncogenes by external carcinogenic factors and their resultant conversion into tumor-causing oncogenes has been well-observed to be the root cause of the development and subsequent progression of cancer in various human tissues. It can be brought about by their overexpression in tumorous cells, as well as other effects such as the presence of mutations or genetic alterations, or fusions occurring with other genes. They also have the ability to influence normal cellular pathways and capture them, utilizing their activity to facilitate critical cellular functions which promote their selective initial growth and development in tumorous cells, provide themselves with a proliferative advantage, maintain a supportive tumor environment and ultimately favor tumor invasion and metastasis to other cells and tissues [
48]. Here we sought to further investigate the multidimensional role of one of the most prominent oncogenes, SPP1, that encodes the protein Osteopontin in human tissues, including its functional mechanisms during cancer progression, its pathway enrichment landscape, as well as structural and other changes occurring in this gene and some of its highly co-expressed genes interacting with it that drive the same processes. Finally, we also wanted to determine these changes occurring in response to particular therapeutic modalities of cancer treatment.
Our transcriptomics analyses confirmed that SPP1 was significantly overexpressed not only in tumorous cells but also displayed changes with higher disease stage and tumor grades, in the case of renal and skin cancer. This effect was also observed in prostate cancer cells which overexpressed the oncogene NF-YA, thus indicating a synergistic co-expression with SPP1 during cancer progression; while transfection with FOXA1, another oncogene, showed it to be significantly upregulated as well. Application of treatment also produced a significant response in it, for example in breast cancer samples receiving IFN-γ stimulation, wherein samples on long-term treatment had a significant reduction of SPP1 expression. Treatment in prostate and skin cancer samples using modalities such as enzymatic inhibition and oncogenic suppression was also able to reduce SPP1 activation. In results from GSEA analyses, pathways involving SPP1, and its interacting partner oncogenes were seen to comprise the majority of all pathways showing activation with cancer progression or repression with therapeutic intervention. The existence of external conditions conducive to tumorigenesis, such as the occurrence of IgA nephropathy in renal tissue, was able to confirm the simultaneous involvement of other oncogenes such as KRAS contributing towards the upregulation of several other host genes, indicated by common pathways. Higher disease stages also influenced the activation of pathways implicated in reduced serum response in renal and skin cancer, which potentially indicated a weakening of the immune system in cases of late-stage cancer, wherein tumor cells have already undergone metastasis. Additional findings using Metascape supported these results as well, with genes upregulated in tumorigenic conditions and/or downregulated in response to treatment mostly being related to normal cell cycle, growth and functionality, also confirming results from previous studies which showed SPP1 to play a role in the enrichment of cellular development and signaling pathways [
7]. On the other hand, the application of treatment modalities was able to induce a number of immune response pathways, implying a restoration of the natural immune system, thereby validating immunotherapy to be a promising method of cancer treatment.
Lastly, analyses of differential isoform expression in oncogenes showed changes in exon selectivity within their constituent ‘transcripts’ or spliced isoforms, to occur concurrently with their patterns of dysregulation and functional enrichment; as well as indicated them as significant drivers of these effects. Variations in exonic region usage between transcripts of several oncogenes were seen to correspond well with changes in their actual expression levels. Since even minor changes in exonic expression have been observed to correlate with considerable differences in overall gene expression and resultant functionality, it also highlights the potential use of therapeutic methods such as exon and gene therapy, to target these effects and induce selective splicing within specific oncogenes and editing their transcriptional ability in cells.
In summary, the results of our analyses confirmed the multifaceted role of SPP1 in several aspects of cancer progression, thus establishing it as one of the most potent oncogenes, while also opening new avenues for its diagnostic utilization as a biomarker and in the development of novel therapeutic modalities.