Version 1
: Received: 20 October 2024 / Approved: 21 October 2024 / Online: 21 October 2024 (11:57:07 CEST)
How to cite:
Arumalla, K.; Haince, J.-F.; Bux, R. A.; Huang, G.; Tappia, P. S.; Ford, W. R.; Vaida, M. L. Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status. Preprints2024, 2024101558. https://doi.org/10.20944/preprints202410.1558.v1
Arumalla, K.; Haince, J.-F.; Bux, R. A.; Huang, G.; Tappia, P. S.; Ford, W. R.; Vaida, M. L. Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status. Preprints 2024, 2024101558. https://doi.org/10.20944/preprints202410.1558.v1
Arumalla, K.; Haince, J.-F.; Bux, R. A.; Huang, G.; Tappia, P. S.; Ford, W. R.; Vaida, M. L. Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status. Preprints2024, 2024101558. https://doi.org/10.20944/preprints202410.1558.v1
APA Style
Arumalla, K., Haince, J. F., Bux, R. A., Huang, G., Tappia, P. S., Ford, W. R., & Vaida, M. L. (2024). Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status. Preprints. https://doi.org/10.20944/preprints202410.1558.v1
Chicago/Turabian Style
Arumalla, K., W. Rand Ford and Maria L. Vaida. 2024 "Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status" Preprints. https://doi.org/10.20944/preprints202410.1558.v1
Abstract
Breast cancer is a global concern as a leading cause of death for women. Early and precise diagnosis can be vital in handling the disease efficiently. Breast cancer subtyping based on estrogen receptor (ER) status is crucial for determining prognosis and treatment. This study uses metabolomics data from plasma samples to detect metabolite biomarkers that could distinguish ER-positive from ER-negative breast cancers in a non-invasive manner. The dataset includes demographic information, ER status, and metabolite levels from 188 breast cancer patients and 73 healthy controls. Supervised, unsupervised, and ensemble machine learning (ML) algorithms were applied to identify key metabolites associated with ER status, including Support Vector Machines (SVM), Multidimensional Scaling (MS), Logistic Regression (LR), and Ensemble learning. The most informative feature set, containing 28 biomarkers and two demographic factors, achieved an impressive 96% accuracy and an area under the curve (AUC) of 93% using the Logistic Regression model. These results suggest that ML has great promise for identifying specific metabolites linked to ER expression, paving the development of a novel analytical tool that can minimize current lab challenges such as analytical sample handling, subjective results interpretation, biological heterogeneity of the tumor in a non-invasive and efficient manner aiding in more precise diagnosis of breast cancer.
Keywords
Estrogen receptors; Breast cancer; Metabolomics; Machine learning models
Subject
Biology and Life Sciences, Other
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.