Preprint Article Version 1 This version is not peer-reviewed

Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction Using the Cleveland Heart Disease Dataset

Version 1 : Received: 15 July 2024 / Approved: 16 July 2024 / Online: 16 July 2024 (12:35:00 CEST)

How to cite: Shrestha, D. Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction Using the Cleveland Heart Disease Dataset. Preprints 2024, 2024071333. https://doi.org/10.20944/preprints202407.1333.v1 Shrestha, D. Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction Using the Cleveland Heart Disease Dataset. Preprints 2024, 2024071333. https://doi.org/10.20944/preprints202407.1333.v1

Abstract

Predicting heart disease is crucial for early diagnosis and intervention, significantly improving patient outcomes and reducing mortality rates. This study compares various machine learning models, including Logistic Regression, Random Forest, Gradient Boosting, XGBoost, and Long Short-Term Memory (LSTM) networks, using the Cleveland Heart Disease dataset. Comprehensive preprocessing steps were undertaken, such as handling missing values, converting categorical variables to numeric forms, and binarizing the target variable for binary classification. Each model was rigorously evaluated using performance metrics, including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC-ROC). SHapley Additive exPlanations (SHAP) values were employed to provide insights into feature importance, ensuring model transparency and interpretability. The results indicate that XGBoost outperformed all other models, achieving an accuracy of 90% and an AUC-ROC of 0.94, demonstrating its superior ability to capture complex patterns in the data through advanced optimization techniques and regularization. This study highlights the significant potential of advanced machine learning techniques, particularly ensemble methods like Gradient Boosting and XGBoost, in enhancing heart disease prediction. These models offer higher accuracy and valuable interpretability, making them practical tools for early diagnosis in clinical settings. Future research should focus on integrating these models into healthcare systems and exploring hybrid approaches to further improve predictive performance and clinical applicability.

Keywords

Heart Disease Prediction, Machine Learning, XGBoost, Gradient Boosting, LSTM (Long Short-Term Memory), SHAP (SHapley Additive exPlanations)

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.