Version 1
: Received: 31 May 2024 / Approved: 4 June 2024 / Online: 7 June 2024 (07:43:04 CEST)
Version 2
: Received: 27 June 2024 / Approved: 27 June 2024 / Online: 27 June 2024 (11:07:41 CEST)
How to cite:
Peterson, R. A.; McGrath, M.; Cavanaugh, J. E. Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints2024, 2024060478. https://doi.org/10.20944/preprints202406.0478.v1
Peterson, R. A.; McGrath, M.; Cavanaugh, J. E. Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints 2024, 2024060478. https://doi.org/10.20944/preprints202406.0478.v1
Peterson, R. A.; McGrath, M.; Cavanaugh, J. E. Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints2024, 2024060478. https://doi.org/10.20944/preprints202406.0478.v1
APA Style
Peterson, R. A., McGrath, M., & Cavanaugh, J. E. (2024). Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets. Preprints. https://doi.org/10.20944/preprints202406.0478.v1
Chicago/Turabian Style
Peterson, R. A., Max McGrath and Joseph E. Cavanaugh. 2024 "Can a Transparent Machine Learning Algorithm Predict Better than Its Black-Box Counterparts?: A Benchmarking Study using 110 Diverse Datasets" Preprints. https://doi.org/10.20944/preprints202406.0478.v1
Abstract
We developed a set of novel machine learning algorithms with the goal of producing transparent models (i.e. understandable-by-humans) while also flexibly accounting for nonlinearity and interactions. Our methods are based on ranked sparsity, and allow for flexibility and user-control in varying the shade of the opacity of black-box machine learning methods. In this work, we put our new ranked sparsity algorithms (as implemented in our new open-source R package, sparseR) to the test in a predictive model bakeoff on a diverse set of simulated and real-world data sets from the Penn Machine Learning Benchmarks database, including both regression and classification problems. We evaluate the extent to which our new human-centered algorithms can attain predictive accuracy that rivals popular black-box approaches such as neural networks, random forests, and SVMs, while also producing more interpretable models. Using out-of-bag error as a meta-outcome, we describe the properties of data sets in which human-centered approaches can perform as well as or better than black-box approaches. We find, interpretable approaches predicted optimally or within 5% of the optimal method in most real-world data sets. We provide a strong rationale for including human-centered transparent algorithms such as ours in predictive modeling applications.
Keywords
model selection; feature selection; lasso; explainable machine learning
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.