Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects

Preprint

Article

Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects

Altmetrics

Downloads

Views

Comments

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

06 September 2024

Posted:

10 September 2024

You are already at the latest version

Alerts

Abstract

Background: Ensemble tree-based models such as Xgboost are highly prognostic in cardiovascular medicine, as measured by the Clinical Effectiveness Metric (CEM). However, their ability to handle correlated data, such as hospital-level effects, are limited. Objectives: The aim of this work is to develop a binary outcome mixed effects Xgboost (BME) model that integrates random effects at the hospital level. To ascertain how well the model handles correlated data in cardiovascular outcomes, we aims to assess its performance and compare it to fixed effects Xgboost and traditional logistic regression models. Methods: A total of 227,087 patients over 17 years of age, undergoing cardiac surgery from 42 UK hospitals between 1 Jan 2012 and 31 Mar 2019 were included. The dataset was split into two cohorts: Training/Validation (n = 157196; 2012-2016) and Holdout (n = 69891; 2017-2019). The outcome variable was 30 days mortality with hospitals considered as clustering variable. The logistic regression, mixed effects logistic regression, Xgboost and binary outcome mixed effects Xgboost (BME) were fitted to both standardized and unstandardized datasets across a range of sample sizes and the estimated prediction power metrics were compared to identify the best approach. Results: The exploratory study found high variability in hospital-related mortality across datasets, which supported the adoption of mixed effects models. Unstandardized Xgboost BME demonstrated marked improvements in predictor power over the Xgboost model at small sample size ranges, but performance differences decreased as dataset sizes increased. Generalized linear models (glm) and generalized linear mixed-effects models (glmer) models followed similar results, with Xgboost models excelling also at greater sample sizes. Conclusions: These findings suggest that integrating mixed effects into machine learning models can enhance their applicability in clinical settings where sample size is small such as rare conditions.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects

Abstract

MDPI Initiatives

Important Links

Subscribe