Preprint Article Version 1 This version is not peer-reviewed

Advancing Subgroup Discovery in Classification: A Novel Random PRIM-Based Classifier Compared with well-established Algorithms

Version 1 : Received: 27 September 2024 / Approved: 29 September 2024 / Online: 30 September 2024 (04:10:40 CEST)

How to cite: Nassih, R.; Berrado, A. Advancing Subgroup Discovery in Classification: A Novel Random PRIM-Based Classifier Compared with well-established Algorithms. Preprints 2024, 2024092331. https://doi.org/10.20944/preprints202409.2331.v1 Nassih, R.; Berrado, A. Advancing Subgroup Discovery in Classification: A Novel Random PRIM-Based Classifier Compared with well-established Algorithms. Preprints 2024, 2024092331. https://doi.org/10.20944/preprints202409.2331.v1

Abstract

Machine learning algorithms have made significant strides, achieving high accuracy in many applications. However, traditional models often need large datasets, as they typically peel sub-stantial portions of the data in each iteration, complicating classifier development without suffi-cient data. In critical fields like healthcare, there is a growing need to identify and analyze small yet significant subgroups within data. To address these challenges, we introduce a novel classifier based on the Patient Rule Induction Method (PRIM), a subgroup discovery algorithm. PRIM finds rules by peeling minimal data at each iteration, enabling the discovery of highly relevant regions. Unlike traditional classifiers, PRIM requires experts to select input spaces manually. Our inno-vation transforms PRIM into an interpretable classifier by starting with random input space se-lections for each class, then pruning rules using Metarules, and finally selecting definitive rules for the classifier. Tested against popular algorithms such as Random Forest, Logistic Regression, and XGBoost, our Random PRIM-based Classifier (R-PRIM-Cl) demonstrates comparable robustness, superior interpretability, and the ability to handle categorical and numeric variables. It discovers more rules in certain datasets, making it valuable especially in fields where understanding the model's decision-making process is as important as its predictive accuracy.

Keywords

Classification; Subgroup Discovery; Patient Rule Induction Method; Metarules; Interpretability

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.