Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets

Version 1 : Received: 31 May 2024 / Approved: 4 June 2024 / Online: 7 June 2024 (07:43:04 CEST)
Version 2 : Received: 27 June 2024 / Approved: 27 June 2024 / Online: 27 June 2024 (11:07:41 CEST)

How to cite: Peterson, R. A.; McGrath, M.; Cavanaugh, J. E. Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints 2024, 2024060478. https://doi.org/10.20944/preprints202406.0478.v1 Peterson, R. A.; McGrath, M.; Cavanaugh, J. E. Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints 2024, 2024060478. https://doi.org/10.20944/preprints202406.0478.v1

Abstract

We developed a set of novel machine learning algorithms with the goal of producing transparent models (i.e. understandable-by-humans) while also flexibly accounting for nonlinearity and interactions. Our methods are based on ranked sparsity, and allow for flexibility and user-control in varying the shade of the opacity of black-box machine learning methods. In this work, we put our new ranked sparsity algorithms (as implemented in our new open-source R package, sparseR) to the test in a predictive model bakeoff on a diverse set of simulated and real-world data sets from the Penn Machine Learning Benchmarks database, including both regression and classification problems. We evaluate the extent to which our new human-centered algorithms can attain predictive accuracy that rivals popular black-box approaches such as neural networks, random forests, and SVMs, while also producing more interpretable models. Using out-of-bag error as a meta-outcome, we describe the properties of data sets in which human-centered approaches can perform as well as or better than black-box approaches. We find, interpretable approaches predicted optimally or within 5% of the optimal method in most real-world data sets. We provide a strong rationale for including human-centered transparent algorithms such as ours in predictive modeling applications.

Keywords

model selection; feature selection; lasso; explainable machine learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.