Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study

Version 1 : Received: 17 June 2024 / Approved: 17 June 2024 / Online: 17 June 2024 (12:59:11 CEST)

How to cite: Gul, S.; Ayturan, K.; Hardalaç, F. PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints 2024, 2024061148. https://doi.org/10.20944/preprints202406.1148.v1 Gul, S.; Ayturan, K.; Hardalaç, F. PyCaret for Predicting Type 2 Diabetes: A Phenotype and Gender‐Based Approach with the Nurses’ Health Study and the Health Professionals’ Follow‐Up Study. Preprints 2024, 2024061148. https://doi.org/10.20944/preprints202406.1148.v1

Abstract

Predicting type 2 diabetes mellitus (T2DM) using phenotypic data with machine learning (ML) techniques has received significant attention in recent years. PyCaret, an ML tool that enables the simultaneous application of 16 different algorithms, was used to predict T2DM using phenotypic variables from the “Nurses’ Health Study” and “Health Professionals’ Follow-up Study” datasets. Ridge classifier, Linear Discriminant Analysis, and Logistic Regression (LR) were the best-performing models for the male-only data subset. For the female-only data subset, LR, Gra-dient Boosting Classifier, and CatBoost Classifier were the strongest models. The AUC, accuracy, and precision were 0.77, 0.70, and 0.70 for males and 0.79, 0.70, and 0.71 for females, respective-ly. The feature importance plot showed that familial history of diabetes (famdb), never having smoked, and high blood pressure (hbp) were the most influential features in females, while famdb, hbp, and current smoking were the major variables in males. In conclusion, PyCaret was used successfully for speed analysis for the prediction of T2DM by simplifying complex ML tasks. Gen-der differences are an important consideration for T2DM prediction. Despite this comprehensive ML tool, phenotypic variables alone may not be sufficient for early T2DM prediction; genotypic variables could also be used in combination for future studies.

Keywords

type 2 diabetes mellitus; PyCaret; machine learning; prediction; feature importance plot; SHAP value

Subject

Medicine and Pharmacology, Endocrinology and Metabolism

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.