Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features

Version 1 : Received: 5 June 2024 / Approved: 5 June 2024 / Online: 12 June 2024 (14:16:43 CEST)

How to cite: Das, U.; Ahmed, B. An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints 2024, 2024060364. https://doi.org/10.20944/preprints202406.0364.v1 Das, U.; Ahmed, B. An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints 2024, 2024060364. https://doi.org/10.20944/preprints202406.0364.v1

Abstract

One of the most prevalent illnesses, diabetes does not directly result in patient mortality. But, it increases the risk of death. Any disease that may be predicted in its early stages can lessen its fatal effects while also enhancing the quality of the healthcare system. For the early-stage prediction of diabetes or such types of non-communicable diseases, we need the proper set of influential features. This research has developed a machine learning-based disease prediction model to identify the influential features for diabetes prediction and give a near-perfect classification accuracy. This model includes Min-Max normalization for data normalization, Isolation Forest (iForest) for outlier removal, Synthetic Minority Oversampling Technique (SMOTE) for oversampling, Random Forest based Recursive Feature Elimination (RFE-RF) test, Chi-Square test, and Minimum Redundancy Maximum Relevancy (mRMR) test based feature selection methods for identifying the influential features, and Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naive Bayes (NB) for the classification. The results clarify that the proposed model outperforms the previous models and studies. The SVM has attained an accuracy of 99.58% in classification using the five features chosen from the Chi-Square test. Lastly, SHAP, an explainable AI model, has been used to assess the classifier model’s performance. These selected features and the classifier model can be used for early-stage diabetes prediction.

Keywords

communicable disease; diabetes; feature selection; imbalanced dataset; insulin; kernel function; outlier
samples

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.