PreprintArticleVersion 1Preserved in Portico This version is not peer-reviewed
PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study
Version 1
: Received: 17 June 2024 / Approved: 17 June 2024 / Online: 17 June 2024 (12:59:11 CEST)
How to cite:
Gul, S.; Ayturan, K.; Hardalaç, F. PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints2024, 2024061148. https://doi.org/10.20944/preprints202406.1148.v1
Gul, S.; Ayturan, K.; Hardalaç, F. PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints 2024, 2024061148. https://doi.org/10.20944/preprints202406.1148.v1
Gul, S.; Ayturan, K.; Hardalaç, F. PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints2024, 2024061148. https://doi.org/10.20944/preprints202406.1148.v1
APA Style
Gul, S., Ayturan, K., & Hardalaç, F. (2024). PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints. https://doi.org/10.20944/preprints202406.1148.v1
Chicago/Turabian Style
Gul, S., Kubilay Ayturan and Fırat Hardalaç. 2024 "PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study" Preprints. https://doi.org/10.20944/preprints202406.1148.v1
Abstract
Predicting type 2 diabetes mellitus (T2DM) using phenotypic data with machine learning (ML) techniques has received significant attention in recent years. PyCaret, an ML tool that enables the simultaneous application of 16 different algorithms, was used to predict T2DM using phenotypic variables from the “Nurses’ Health Study” and “Health Professionals’ Follow-up Study” datasets. Ridge classifier, Linear Discriminant Analysis, and Logistic Regression (LR) were the best-performing models for the male-only data subset. For the female-only data subset, LR, Gra-dient Boosting Classifier, and CatBoost Classifier were the strongest models. The AUC, accuracy, and precision were 0.77, 0.70, and 0.70 for males and 0.79, 0.70, and 0.71 for females, respective-ly. The feature importance plot showed that familial history of diabetes (famdb), never having smoked, and high blood pressure (hbp) were the most influential features in females, while famdb, hbp, and current smoking were the major variables in males. In conclusion, PyCaret was used successfully for speed analysis for the prediction of T2DM by simplifying complex ML tasks. Gen-der differences are an important consideration for T2DM prediction. Despite this comprehensive ML tool, phenotypic variables alone may not be sufficient for early T2DM prediction; genotypic variables could also be used in combination for future studies.
Keywords
type 2 diabetes mellitus; PyCaret; machine learning; prediction; feature importance plot; SHAP value
Subject
Medicine and Pharmacology, Endocrinology and Metabolism
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.