Version 1
: Received: 29 January 2024 / Approved: 30 January 2024 / Online: 31 January 2024 (01:49:19 CET)
Version 2
: Received: 17 April 2024 / Approved: 18 April 2024 / Online: 18 April 2024 (09:55:50 CEST)
How to cite:
Mukhopadhyay, D.; Phanord, D. D.; Dalpatadu, R. J.; Gewali, L. P.; Singh, A. K. ML Classification of Cancer Types Using High Dimensional Gene Expression Microarray Data. Preprints2024, 2024012067. https://doi.org/10.20944/preprints202401.2067.v2
Mukhopadhyay, D.; Phanord, D. D.; Dalpatadu, R. J.; Gewali, L. P.; Singh, A. K. ML Classification of Cancer Types Using High Dimensional Gene Expression Microarray Data. Preprints 2024, 2024012067. https://doi.org/10.20944/preprints202401.2067.v2
Mukhopadhyay, D.; Phanord, D. D.; Dalpatadu, R. J.; Gewali, L. P.; Singh, A. K. ML Classification of Cancer Types Using High Dimensional Gene Expression Microarray Data. Preprints2024, 2024012067. https://doi.org/10.20944/preprints202401.2067.v2
APA Style
Mukhopadhyay, D., Phanord, D. D., Dalpatadu, R. J., Gewali, L. P., & Singh, A. K. (2024). ML Classification of Cancer Types Using High Dimensional Gene Expression Microarray Data. Preprints. https://doi.org/10.20944/preprints202401.2067.v2
Chicago/Turabian Style
Mukhopadhyay, D., Laxmi P Gewali and Ashok K Singh. 2024 "ML Classification of Cancer Types Using High Dimensional Gene Expression Microarray Data" Preprints. https://doi.org/10.20944/preprints202401.2067.v2
Abstract
Cancer is a disease caused by the abnormal growth of cells in different parts of body is one of the top causes of death globally. Microarray gene expression data plays a critical role in the identification and classification of cancer tissues. Due to recent advancements in Machine Learning (ML) techniques, researchers are analyzing gene expression data using a variety of such techniques to model the progression rate & treatment of cancer patients with great effect. But high dimensionality alongside the presence of highly correlated columns in gene expression datasets leads to computational difficulties. This paper aims to propose the use of ML classification techniques- Linear Discriminant Analysis (LDA) & Random Forest (RF) for classifying five types of cancer (breast cancer, kidney cancer, colon cancer, lung cancer and prostate cancer) based on high dimensional microarray gene expression data. Principal component analysis (PCA) was used for dimensionality reduction, and principal component scores of the raw data for classification. Six distinct categorization performance measures were used to evaluate these approaches; RF method provided us with higher accuracy than LDA method. The method and results of this article should be helpful to researchers who are dealing with many genes in microarray data.
Keywords
principal components analysis; Linear Discriminant Analysis; Random Forest; precision; recall; F1; AUC; macro-averaged AUC; micro-averaged AUC
Subject
Biology and Life Sciences, Biochemistry and Molecular Biology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.