Preprint Article Version 1 This version is not peer-reviewed

A Regularized Tree Forest for Classification in the Presence of Extreme Class Imbalance

Version 1 : Received: 19 July 2024 / Approved: 22 July 2024 / Online: 22 July 2024 (07:13:53 CEST)

How to cite: Safi, S. K.; Gul, S. A Regularized Tree Forest for Classification in the Presence of Extreme Class Imbalance. Preprints 2024, 2024071684. https://doi.org/10.20944/preprints202407.1684.v1 Safi, S. K.; Gul, S. A Regularized Tree Forest for Classification in the Presence of Extreme Class Imbalance. Preprints 2024, 2024071684. https://doi.org/10.20944/preprints202407.1684.v1

Abstract

Machine-learning methods used for classification are challenged by the class imbalance problem, where a certain class is underrepresented. Over/under-sampling the majority/minority class observations or model selection for ensemble methods alone might not be effective if the class imbalance ratio is very high. To address this concern, a novel method is presented for generating synthetic data for minority class observations in conjunction with an optimal tree ensemble classifier (OTEC). This novel synthetic method first generates minority class instances to balance the training data. Then it applies OTEC, where models are selected based on their performance using out-of-bag observations and training subsamples. A total of 20 benchmark problems on binary classification with moderate to extreme class imbalance are used to assess the efficacy of the proposed method against other well-known methods, including optimal tree ensemble, smote random forest, under-sampling random forest, oversampling random forest, k-nearest-neighbor, support vector machine, tree, and artificial neural network, using classification error rate, sensitivity, specificity, precision, recall and F1-score as performance metrics to assess the efficacy of the proposed method. The analyses presented in this study revealed that the proposed method based on data balancing and model selection yielded better results than other methods.

Keywords

 machine learning; optimal tree ensemble classifier; random forest; support vector machine; artificial neural network 

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.