Preprint Article Version 1 This version is not peer-reviewed

Alzheimer Stage Diagnosis from Genomic and Clinical Data Modalities Using ‘Artificial Neural Network’

Version 1 : Received: 30 August 2024 / Approved: 30 August 2024 / Online: 3 September 2024 (05:01:42 CEST)

How to cite: Sarma, M.; Chatterjee, S. Alzheimer Stage Diagnosis from Genomic and Clinical Data Modalities Using ‘Artificial Neural Network’. Preprints 2024, 2024082231. https://doi.org/10.20944/preprints202408.2231.v1 Sarma, M.; Chatterjee, S. Alzheimer Stage Diagnosis from Genomic and Clinical Data Modalities Using ‘Artificial Neural Network’. Preprints 2024, 2024082231. https://doi.org/10.20944/preprints202408.2231.v1

Abstract

INTRODUCTION: This study focusses on diagnosis of stages of AD (Alzheimer’s disease) including MCI (Mild Cognitive Impairment) from two data modalities - gene expression and clinical data of ADNI (Alzheimer’s Disease Neuroimaging Initiative ) participants using multiclassification. The gene expression dataset is highly imbalanced and of HDLSS (high-dimensional and low-sample-size) characteristics. This is the only study where multiclassification based AD stage diagnosis is done to identify multiple stages of Alzheimer. We are able to achieve the best multiclassification result in both the modalities and identify new genetic biomarkers. METHODS: Combination of XGBoost and SFBS (“Sequential Floating Backward Selection”) methods is used to select features. We are able to select the most effective 95 gene probsets out of 49,386. For clinical study data, 8 most effective biomarkers could be selected using SFBS. For both genomic and clinical data, DL (‘Deep Learning’) classifier is used to identify stages - CN (Cognitive Normal), MCI (Mild Cognitive Impairment), AD (Alzheimer’s Disease / Dementia). Because of high data imbalance in genomic data, border line oversampling is used for model training and original data for validation. RESULT & DISCUSSION: With clinical data, we achieved ‘ROC AUC’ scores 0.97, 0.95, 0.94 for CN, MCI, Dementia stage respectively . We achieve ‘ROC AUC’ scores 0.75, 0.74, 0.70 for CN, MCI, Dementia stage respectively and 0.67 for both micro average F1 scores and micro weighted F1 score. This is the best result so far for AD stage diagnosis from gene expression profile data through multiclassification with ADNI data. Results reflect that our multiclassification model can efficiently handle the imbalanced data of HDLSS nature to identify samples of minority class. MAPK14, ZNF835, MID1, HLA-DQA1, TEP1 are some of the new genes found to be associated with AD risk. DRAXIN, HSPA12B, USP47 etc. are found to be AD preventive or suppressor.

Keywords

disease stage diagnosis; blood gene expression; data imbalance; multiclassification; F1 score; AD risk gene; SHAP; LIME

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.