Version 1
: Received: 5 June 2024 / Approved: 5 June 2024 / Online: 12 June 2024 (14:16:43 CEST)
How to cite:
Das, U.; Ahmed, B. An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints2024, 2024060364. https://doi.org/10.20944/preprints202406.0364.v1
Das, U.; Ahmed, B. An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints 2024, 2024060364. https://doi.org/10.20944/preprints202406.0364.v1
Das, U.; Ahmed, B. An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints2024, 2024060364. https://doi.org/10.20944/preprints202406.0364.v1
APA Style
Das, U., & Ahmed, B. (2024). An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features. Preprints. https://doi.org/10.20944/preprints202406.0364.v1
Chicago/Turabian Style
Das, U. and Boshir Ahmed. 2024 "An Explainable AI-based Machine Learning Approach for Predicting Diabetes in the Early Stage Using the Influential Features" Preprints. https://doi.org/10.20944/preprints202406.0364.v1
Abstract
One of the most prevalent illnesses, diabetes does not directly result in patient mortality. But, it increases the risk of death. Any disease that may be predicted in its early stages can lessen its fatal effects while also enhancing the quality of the healthcare system. For the early-stage prediction of diabetes or such types of non-communicable diseases, we need the proper set of influential features. This research has developed a machine learning-based disease prediction model to identify the influential features for diabetes prediction and give a near-perfect classification accuracy. This model includes Min-Max normalization for data normalization, Isolation Forest (iForest) for outlier removal, Synthetic Minority Oversampling Technique (SMOTE) for oversampling, Random Forest based Recursive Feature Elimination (RFE-RF) test, Chi-Square test, and Minimum Redundancy Maximum Relevancy (mRMR) test based feature selection methods for identifying the influential features, and Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naive Bayes (NB) for the classification. The results clarify that the proposed model outperforms the previous models and studies. The SVM has attained an accuracy of 99.58% in classification using the five features chosen from the Chi-Square test. Lastly, SHAP, an explainable AI model, has been used to assess the classifier model’s performance. These selected features and the classifier model can be used for early-stage diabetes prediction.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.