Version 1
: Received: 22 October 2024 / Approved: 22 October 2024 / Online: 23 October 2024 (07:26:42 CEST)
Version 2
: Received: 25 October 2024 / Approved: 25 October 2024 / Online: 25 October 2024 (15:04:04 CEST)
How to cite:
Wu, X.; Xiao, Y.; Liu, X. Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints2024, 2024101775. https://doi.org/10.20944/preprints202410.1775.v2
Wu, X.; Xiao, Y.; Liu, X. Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints 2024, 2024101775. https://doi.org/10.20944/preprints202410.1775.v2
Wu, X.; Xiao, Y.; Liu, X. Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints2024, 2024101775. https://doi.org/10.20944/preprints202410.1775.v2
APA Style
Wu, X., Xiao, Y., & Liu, X. (2024). Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints. https://doi.org/10.20944/preprints202410.1775.v2
Chicago/Turabian Style
Wu, X., Yimeng Xiao and Xueying Liu. 2024 "Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost" Preprints. https://doi.org/10.20944/preprints202410.1775.v2
Abstract
Abstract. The volatility of global energy markets, particularly electricity prices, plays a crucial role in influencing international economic activities.In the era of big data, machine learning has revolutionized the field of cancer research, particularly in analyzing gene expression data. This study explores the application of machine learning models to the GSE45827 dataset, which contains breast cancer gene expression profiles. With over 54,000 genes and 151 samples categorized into six classes, the dataset presents a high-dimensional challenge that is addressed using dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). The PCA method proved most effective in retaining the critical features of the data in lower dimensions, allowing for clearer visualization and enhanced model performance.The reduced dataset was then classified using the eXtreme Gradient Boosting (XGBoost) model, achieving promising multi-class classification results. The model demonstrated high precision, recall, and F1-scores across several classes, particularly excelling in classes 1, 2, and 5. However, certain classes, such as 0 and 4, exhibited lower recall, highlighting areas for further refinement. The integration of PCA and XGBoost not only improved the interpretability and computational efficiency of the model but also contributed to the accurate identification of breast cancer subtypes, emphasizing the importance of machine learning in cancer diagnosis and treatment.
Keywords
XGBoost; PCA; t-SNE; machine learning; cancer gene expression
Subject
Biology and Life Sciences, Cell and Developmental Biology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.