3.1. Predictive Analytics of Heart Failure Prediction
We found in these recent studies that artificial intelligence, specifically machine learning, plays an important role in predicting heart failure and risk from electronic medical records. This approach will be of great help to the clinical decision-making process and in diagnosing patients with heart failure. Study [
5], designed a clinical decision support system (CDSS) by implementing machine learning in decision-making to evaluate the severity of HF among patients with HF. Machine learning algorithms were involved in building the predictive models of the study and in evaluating all of the models by performing cross-validation of each model. The study concluded that machine learning-based CDSS is useful for diagnosing heart failure and is readable even when done by non-cardiologists or non-clinician users.
Involving more than 400,000 primary care patients, studies [
6] and [
7] used a machine learning algorithm to diagnose heart failure in primary care patients collected by the Geisinger Clinic. These studies were able to predict heart failure in different time windows and showed good performance. The models in [
6] used unstructured and structured data in model building, while the models in [
7] were built by processing EHR data. Similar to those studies, [
8] used the data of real patients at King Saud University Medical City (KSUMC) and manually extracted the necessary information of the patients to be processed to predictive models using a machine learning approach in a big data environment. The study involved PCA's pre-processing and feature reduction techniques to obtain a promising predictive result. The study performed well in predicting heart failure, although the number of patients was only 100.
Aside from machine learning in building its predictive models, study [
9] involved more than one million elderly patients collected by Medicare USA and built predictive models using a trajectory-based disease progression model to predict heart failure among unseen patients. [
10] also implemented the Cox hazard proportional model to predict the risk of heart failure by patients collected by COOL-AF Thailand between 2014 and 2017. The study evaluated their predictive model by calculating the model’s C-index, D-statistics, calibration plot, brier test, and survival analysis. The proposed model provided good prediction of heart failure.
Study [
11] described using a predictive analytics approach to predicting Heart Failure with preserved Ejection Fraction (HFpEF), a subtype of heart failure. This study built predictive models using five different machine-learning algorithms. It assessed those models using c-statistic, brier score, sensitivity, and specificity in the performance evaluation of the predictive models. The study stated that predictive analytics accurately predicted the presence of HFpEF in patients with heart failure. Similar to the previous model, [
12] used three algorithms to predict and identify Acute Decompensated Heart Failure (ADHF) in patients collected by Tisch Hospital, USA. It assessed each model with AUC, sensitivity, and PPV to determine which model performed best. The study found that a machine learning-based predictive model best predicted ADHF.
Study [
13] used datasets provided by the University College of Dublin (UCD) and the Department of Cardiology of the Hospital University Ioannina that included 487 patients segregated by type of heart failure. The study implemented a feature for removal of data with more than or equal to 50% missing values, removed discrete features with unbalanced distribution, and detected and corrected the outliers and typos of the dataset. The study also performed class balancing in the next step by applying an under-sampling technique to the dataset. Through this approach they were able to obtain an ideal dataset. It generated promising results for classifying heart failure by dividing the main dataset into a sub-dataset for each type.
Studies have shown that the predictive analytics approach can optimally predict heart failure. Furthermore, this approach can also explain how the predictive models do the prediction. [
14] describes the use of model interpretation and feature importance explanation using the SHAP approach to give physicians an understanding of the models. The models of this study used five machine learning algorithms to process 5,004 data sets from the Medical University Hospital in Shanxi Province of China. It used the Shapley additive explanations (SHAP) approach for the best-performed model to interpret the model and its feature importance. To prove the effectiveness of the predictive analytics approach using machine learning, the experimental study of [
15] showed that in predicting cardiovascular disease, of which heart failure is one type, predictive analytics using machine learning methods outperformed other risk scales like SCORE and REGICOR. The study was part of an analytical observational study of the ESCARVAL RISK Cohort in Spain. In this study, the machine learning-based predictive models were compared to SCORE and REGICOR to predict the cardiovascular risk of patients in the study cohort.
Various studies are cohort studies or use hospital data, as did the previously discussed studies. However, there are selected studies that build predictive models using open, public data from machine learning repositories such as Physionet and UCI. Researchers commonly use heart disease and heart failure datasets from the UCI machine learning repository to build predictive models to predict heart failure and the risk of heart failure. [
16], [
17], and [
18] used the Heart Disease dataset of the UCI database to build their models and involved various machine learning algorithms in their construction. They assessed each model with classification metrics like accuracy, precision, F1 score, recall, and AUC-ROC score to discover the performance of each model.
Because the Heart Disease dataset of the UCI repository has missing values, various techniques must be used to address problems associated with the missing values. Study [
19] removed all the missing values from the dataset. However, this technique might result in biased prediction results. [
20] imputed the missing values using the k-nearest Neighbor imputation technique. The study resulted in better performance compared to the previous study, which only removed the missing values without performing imputation.
A study by [
21] showed that solving an imbalance problem might improve performance. The study used the SMOTE technique to address the imbalance in the Heart Disease dataset and built predictive models with six different algorithms. The study showed that using the balancing method before building their predictive models improved the predictive performance. However, because the dataset used had a slightly different number of each class, the imbalance correction technique was not necessary. In the case of [
22], the study also used the SMOTE technique to solve the imbalance problem in the Heart Failure dataset. In the predictive model of [
22], SMOTE-ENN, the SMOTE technique combined with ENN, was used to address the imbalance in the Heart Failure dataset. They did scaling and standardization techniques to normalize the dataset before addressing the imbalance. The study showed improved prediction performance compared to models without balancing and normalization. Like the previous study, [
23] used the SMOTE technique and performed feature selection to obtain an ideal dataset with the most important features. The study resulted in good classification performance compared to the other studies mentioned.
Similar to [
23] that performs feature selection, [
24] and [
25] implemented feature selection and optimization to obtain the most important features for the ideal heart disease dataset from the UCI database. [
24] adopted the KS-Test to select the optimal attributes for the dataset and built a predictive model using a decision tree algorithm. However, the proposed model was not significantly improved when using Mathew’s correlation test to evaluate its predictive model. In [
25], the Lasso algorithm was used to select the features of the Heart Disease dataset. Compared to the previous study, this study resulted in better classification performance in predicting heart failure. Previous experimental studies pre-processed data to obtain an ideal UCI Heart Disease and Heart Failure dataset, so the predictive models resulted in good performance and prediction. [
26] did feature selection using Pearson’s correlation to obtain an ideal version of the UCI heart disease dataset after discarding variables with missing values. The study then named the pre-processed and ideal dataset ‘Satvi’. The study shows improved predictive model performance for the UCI Heart Disease dataset.
Unlike other studies that used fundamental or ensemble machine learning in building their predictive models, [
27] used a big data approach and the UCI Heart Disease dataset to analyze and predict heart status. The study implemented a clustering technique to filter unnecessary data and improve the prediction effectiveness. In this study the proposed model with a clusterization approach resulted in outstanding predictive performance with high CPU utilization and low processing time. In another study, [
28], implementing quantum computing in machine learning and deep learning algorithms resulted in better predictive performance than conventional machine learning algorithms. The study solved the complex dimension and size problem of the UCI Heart Disease dataset by using quantum computing to build predictive models, then compared their classification performance with other machine learning algorithms.
Similar to the UCI repository, Physionet provides various datasets in the field of health and medicine. Study [
29] used datasets from the MIT-BIH and BIDMC databases, which are publicly open to access and use through Physionet. The study aimed to predict heart failure by analyzing the ECG signals from the datasets they used. By implementing a deep learning algorithm, the study was able to build a predictive model with good predictive model performance in terms of accuracy, sensitivity, and specificity in predicting heart disease. Using the MIMIC-III datasets of Physionet, [
30] was able to predict the length of stay for patients with heart failure by implementing machine learning to build their predictive models. By using several machine learning models, the predictive analytics approach in this study was able to provide good prediction and model performance in their evaluation stage. Various studies give evidence of the benefit of implementing predictive analytics to predict heart failure and the risk of heart failure. The predictive analytics approach will be of great help to participants in the health sector, especially clinicians and physicians, by making it easier to identify and diagnose heart failure and by estimating the risk of heart failure risk at an early date.