In this work, we adapt the fine-tuning algorithm of the Naïve Bayesian (FTNB) classifier to make it more suitable for imbalanced datasets. In particular, we boost misclassified instance probability terms by an amount that is disproportional to the harmonic mean of actual and predicted classes. The intuition is that discriminative attributes when the instance is misclassified would have small probability term pair values in both the actual class due to data scarcity and the predicted class due to weak correlation. Conversely, if both values are relatively high, then the attribute has good data coverage (support) and it should not be a cause for misclassification. Since the harmonic average is dominated by the smaller value and we have an imbalanced dataset, we should enact a large update if both or either term probabilities of actual and predicted classes are small. We used several benchmark datasets (60 different balanced and imbalanced datasets) to determine if the poor performance of the NB classifier is due to the scarcity of data and compared the performance of the proposed algorithm with NB, original FTNB, and other relatively new SOTA Ensemble Imbalanced Classifiers. Our empirical results reveal that the new proposed algorithm significantly outperforms all other classifiers.
Keywords:
Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.