This section details the results of the simulations that we performed to evaluate XGBoost and Random Forest classification techniques to predict customer churn. This section contains two main parts: Setup and results.
B. Results
The following table presents a comparative analysis of the Random Forest (RF) and XGBoost models on the customer churn dataset, with and without applying the GNUS sampling technique. The performance of the models is evaluated using several key metrics: Accuracy, Precision, Recall, F1-Score, and ROC-AUC.
Table 1.
Evaluation metrics.
Table 1.
Evaluation metrics.
Accuracy:
XGBoost-Initial demonstrated the highest accuracy, achieving 92.94%, slightly outperforming the RF-Initial model, which attained an accuracy of 91.76%. These figures suggest that XGBoost is better at correctly predicting churn and non-churn instances than the Random Forest model.
After applying GNUS sampling, both models experienced a slight reduction in accuracy. XGBoost-GNUS achieved an accuracy of 91.61%, and RF-GNUS gained 90.67%. The decrease in accuracy is expected due to the focus on improving predictions for the minority class (churn), which can slightly compromise overall accuracy.
Precision:
Precision is the proportion of correctly predicted churn cases out of all expected ones. The RF-initial model exhibited the highest precision, scoring 95.12%, indicating that the Random Forest model was very conservative in predicting churn, leading to fewer false positives.
However, this precision came at the expense of recall, as the RF model identified fewer actual churn cases. XGBoost-initial also performed well in precision, scoring 90.09%.
After applying GNUS, precision decreased for both models, reflecting a shift in focus from precision to recall. RF-GNUS showed an accuracy of 77.27%, and XGBoost-GNUS had a precision of 83.96%. This decrease suggests that the models predicted churn more frequently after GNUS sampling, resulting in more false positives and capturing more actual churn cases (improved recall).
Recall:
Recall, which measures the ability to identify actual churn cases correctly, was significantly lower in the initial models, particularly for the RF-initial model, which had a recall of 43.58%. This indicates that the initial Random Forest model struggled to identify many churn cases despite its high precision.
XGBoost-initial performed better, with a recall of 55.87%, demonstrating a more balanced approach between precision and recall.
After applying GNUS sampling, both models showed improvements in recall. The RF-GNUS model increased its recall to 47.49%, and XGBoost-GNUS improved to 49.72%. This indicates that GNUS helped both models capture more of the actual churn instances, which is crucial for customer churn prediction.
F1-Score:
The F1-Score was highest for the XGBoost-initial model, with a score of 68.97%. This reflects its better overall balance in identifying churn cases while maintaining precision.
The RF-initial model had a significantly lower F1-score of 59.77%, as it prioritized precision over recall, leading to fewer correctly predicted churn cases.
After GNUS sampling, the F1-scores of both models decreased slightly, with RF-GNUS scoring 58.88% and XGBoost-GNUS scoring 62.46%. This slight drop in F1-score reflects the trade-off between increasing recall and maintaining precision.
ROC-AUC:
The ROC-AUC metric, which measures the model’s ability to distinguish between churn and non-churn classes, shows that XGBoost-initial outperformed the other models with an AUC of 87.61%, indicating that it was the best at distinguishing between the two classes.
The RF-initial model had an AUC of 83.25%, which is respectable but shows a weaker ability to separate the classes than XGBoost.
After applying GNUS, both models experienced a slight reduction in their ROC-AUC scores. XGBoost-GNUS had an AUC of 83.15%, and RF (GNUS) had an AUC of 81.32%. This reduction aligns with the focus on improving recall at the expense of some precision and overall classification ability.
In summary, the XGBoost model consistently outperformed the Random Forest model across most metrics before and after applying GNUS sampling. XGBoost-initial had the best balance between precision and recall and the highest F1-score and ROC-AUC. However, after applying GNUS sampling, both models improved in the recall, making them more effective at identifying churn cases, though with some precision and overall accuracy trade-offs. These results highlight the challenge of balancing precision and recall in customer churn prediction and demonstrate the benefits of using techniques like GNUS sampling to improve the identification of minority class instances.
In summary, the XGBoost model consistently outperformed the Random Forest model across most metrics before and after applying GNUS sampling. XGBoost-initial had the best balance between precision and recall and the highest F1-score and ROC-AUC. However, after applying GNUS sampling, both models improved in recall, making them more effective at identifying churn cases, though with some precision and overall accuracy trade-offs. These results highlight the challenge of balancing precision and recall in customer churn prediction and demonstrate the benefits of using techniques like GNUS sampling to improve the identification of minority class instances.