Version 1
: Received: 5 February 2021 / Approved: 9 February 2021 / Online: 9 February 2021 (10:26:47 CET)
How to cite:
Perera, Y.; Gonzalez, A.; Perez, R. Principal Component Analysis of RNA-seq Data Unveils a Novel Prostate Cancer-Associated Gene Expression Signature. Preprints2021, 2021020234. https://doi.org/10.20944/preprints202102.0234.v1
Perera, Y.; Gonzalez, A.; Perez, R. Principal Component Analysis of RNA-seq Data Unveils a Novel Prostate Cancer-Associated Gene Expression Signature. Preprints 2021, 2021020234. https://doi.org/10.20944/preprints202102.0234.v1
Perera, Y.; Gonzalez, A.; Perez, R. Principal Component Analysis of RNA-seq Data Unveils a Novel Prostate Cancer-Associated Gene Expression Signature. Preprints2021, 2021020234. https://doi.org/10.20944/preprints202102.0234.v1
APA Style
Perera, Y., Gonzalez, A., & Perez, R. (2021). Principal Component Analysis of RNA-seq Data Unveils a Novel Prostate Cancer-Associated Gene Expression Signature. Preprints. https://doi.org/10.20944/preprints202102.0234.v1
Chicago/Turabian Style
Perera, Y., Augusto Gonzalez and Rolando Perez. 2021 "Principal Component Analysis of RNA-seq Data Unveils a Novel Prostate Cancer-Associated Gene Expression Signature" Preprints. https://doi.org/10.20944/preprints202102.0234.v1
Abstract
Prostate cancer (Pca) is a highly heterogeneous disease and the second more common tumor in males. Molecular and genetic profiles have been used to identify subtypes and guide therapeutic intervention. However, roughly 26% of primary Pca are driven by unknown molecular lesions. We use Principal Component Analysis (PCA) and custom RNAseq-data normalization to identify a gene expression signature which segregates primary PRAD from normal tissues. This Core-Expression Signature (PRAD-CES) includes 33 genes and accounts for 39% of data complexity along the PC1-cancer axis. The PRAD-CES is populated by protein-coding (AMACR, TP63, HPN) and RNA-genes (PCA3, ARLN1) sparsely found in previous studies, validated/predicted biomarkers (HOXC6, TDRD1, DLX1), and/or cancer drivers (PCA3, ARLN1, PCAT-14). Of note, the PRAD-CES also comprises six over-expressed LncRNAs without previous Pca association, four of them potentially modulating driver’s genes TMPRSS2, PRUNE2 and AMACR. Overall, our PCA capture 57% of data complexity within PC1-3. GO enrichment and correlation analysis involving major clinical features (i.e., Gleason Score, AR Score, TMPRSS2-ERG fusion and Tumor Cellularity) suggest that PC2 and PC3 gene signatures might describe more aggressive and inflammation-prone transitional forms of PRAD. Of note, surfaced genes may entail novel prognostic biomarkers and molecular alterations to intervene. Particularly, our work uncovered RNA genes with appealing implications on Pca biology and progression.
Keywords
Principal Component Analysis, RNA-seq, prostate cancer, biomarkers, RNA genes
Subject
Biology and Life Sciences, Anatomy and Physiology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.