Preprint Article Version 1 This version is not peer-reviewed

Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects

Version 1 : Received: 6 September 2024 / Approved: 9 September 2024 / Online: 10 September 2024 (12:39:32 CEST)

How to cite: Dong, T.; Oronti, I. B.; Sinha, S.; Freitas, A.; Zhai, B.; Chan, J.; Fudulu, D. P.; Caputo, M.; Angelini, G. D. Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects. Preprints 2024, 2024090698. https://doi.org/10.20944/preprints202409.0698.v1 Dong, T.; Oronti, I. B.; Sinha, S.; Freitas, A.; Zhai, B.; Chan, J.; Fudulu, D. P.; Caputo, M.; Angelini, G. D. Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects. Preprints 2024, 2024090698. https://doi.org/10.20944/preprints202409.0698.v1

Abstract

Background: Ensemble tree-based models such as Xgboost are highly prognostic in cardiovascular medicine, as measured by the Clinical Effectiveness Metric (CEM). However, their ability to handle correlated data, such as hospital-level effects, are limited. Objectives: The aim of this work is to develop a binary outcome mixed effects Xgboost (BME) model that integrates random effects at the hospital level. To ascertain how well the model handles correlated data in cardiovascular outcomes, we aims to assess its performance and compare it to fixed effects Xgboost and traditional logistic regression models. Methods: A total of 227,087 patients over 17 years of age, undergoing cardiac surgery from 42 UK hospitals between 1 Jan 2012 and 31 Mar 2019 were included. The dataset was split into two cohorts: Training/Validation (n = 157196; 2012-2016) and Holdout (n = 69891; 2017-2019). The outcome variable was 30 days mortality with hospitals considered as clustering variable. The logistic regression, mixed effects logistic regression, Xgboost and binary outcome mixed effects Xgboost (BME) were fitted to both standardized and unstandardized datasets across a range of sample sizes and the estimated prediction power metrics were compared to identify the best approach. Results: The exploratory study found high variability in hospital-related mortality across datasets, which supported the adoption of mixed effects models. Unstandardized Xgboost BME demonstrated marked improvements in predictor power over the Xgboost model at small sample size ranges, but performance differences decreased as dataset sizes increased. Generalized linear models (glm) and generalized linear mixed-effects models (glmer) models followed similar results, with Xgboost models excelling also at greater sample sizes. Conclusions: These findings suggest that integrating mixed effects into machine learning models can enhance their applicability in clinical settings where sample size is small such as rare conditions.

Keywords

machine learning; AI; random effects; cardiovascular medicine; risk prediction; expectation-maximization; xgboost

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.