Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost

Version 1 : Received: 22 October 2024 / Approved: 22 October 2024 / Online: 23 October 2024 (07:26:42 CEST)
Version 2 : Received: 25 October 2024 / Approved: 25 October 2024 / Online: 25 October 2024 (15:04:04 CEST)

How to cite: Wu, X.; Sun, Y.; Liu, X. Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints 2024, 2024101775. https://doi.org/10.20944/preprints202410.1775.v1 Wu, X.; Sun, Y.; Liu, X. Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints 2024, 2024101775. https://doi.org/10.20944/preprints202410.1775.v1

Abstract

Abstract. The volatility of global energy markets, particularly electricity prices, plays a crucial role in influencing international economic activities. In the era of big data, machine learning has revolutionized the field of cancer research, particularly in analyzing gene expression data. This study explores the application of machine learning models to the GSE45827 dataset, which contains breast cancer gene expression profiles. With over 54,000 genes and 151 samples categorized into six classes, the dataset presents a high-dimensional challenge that is addressed using dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). The PCA method proved most effective in retaining the critical features of the data in lower dimensions, allowing for clearer visualization and enhanced model performance.The reduced dataset was then classified using the eXtreme Gradient Boosting (XGBoost) model, achieving promising multi-class classification results. The model demonstrated high precision, recall, and F1-scores across several classes, particularly excelling in classes 1, 2, and 5. However, certain classes, such as 0 and 4, exhibited lower recall, highlighting areas for further refinement. The integration of PCA and XGBoost not only improved the interpretability and computational efficiency of the model but also contributed to the accurate identification of breast cancer subtypes, emphasizing the importance of machine learning in cancer diagnosis and treatment.

Keywords

XGBoost; PCA; t-SNE; machine learning; cancer gene expression

Subject

Biology and Life Sciences, Cell and Developmental Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.