Version 1
: Received: 28 October 2024 / Approved: 29 October 2024 / Online: 30 October 2024 (05:01:51 CET)
How to cite:
Imani, M. Evaluating an Ensemble of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction. Preprints2024, 2024102329. https://doi.org/10.20944/preprints202410.2329.v1
Imani, M. Evaluating an Ensemble of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction. Preprints 2024, 2024102329. https://doi.org/10.20944/preprints202410.2329.v1
Imani, M. Evaluating an Ensemble of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction. Preprints2024, 2024102329. https://doi.org/10.20944/preprints202410.2329.v1
APA Style
Imani, M. (2024). Evaluating an Ensemble of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction. Preprints. https://doi.org/10.20944/preprints202410.2329.v1
Chicago/Turabian Style
Imani, M. 2024 "Evaluating an Ensemble of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction" Preprints. https://doi.org/10.20944/preprints202410.2329.v1
Abstract
Customer churn is a critical challenge for subscription-based businesses, especially in telecommunications, where retaining customers is essential to maintaining profitability. This study investigates the efficacy of two ML models, XGBoost and Random Forest, for predicting customer churn using a publicly available telecommunications dataset. The dataset, characterized by imbalanced classes, presents a crucial challenge addressed by incorporating the Gaussian Noise Upsampling (GNUS) sampling technique. The study evaluates and compares the two models using essential performance indicators, including precision, recall, accuracy, F1-score, and ROC-AUC, both with and without GNUS sampling. The results indicate that while XGBoost initially outperforms Random Forest across most metrics, both models show improved recall after the GNUS application, particularly in identifying churn cases. However, this improvement in recall comes with a trade-off in precision and overall accuracy. The findings highlight the relevance of using appropriate sampling techniques to tackle class imbalance in churn prediction and provide valuable insights for developing proactive customer retention strategies.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.