Version 1
: Received: 14 June 2024 / Approved: 15 June 2024 / Online: 17 June 2024 (08:33:48 CEST)
How to cite:
Bittmann, F. Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints2024, 2024061064. https://doi.org/10.20944/preprints202406.1064.v1
Bittmann, F. Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints 2024, 2024061064. https://doi.org/10.20944/preprints202406.1064.v1
Bittmann, F. Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints2024, 2024061064. https://doi.org/10.20944/preprints202406.1064.v1
APA Style
Bittmann, F. (2024). Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints. https://doi.org/10.20944/preprints202406.1064.v1
Chicago/Turabian Style
Bittmann, F. 2024 "Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results" Preprints. https://doi.org/10.20944/preprints202406.1064.v1
Abstract
Multiple imputation by chained equations (MICE) is a popular and well-researched statistical approach to account for missing data in empirical research. However, applying the technique requires researchers to set dozens of smaller and larger parameters to build an adequate imputation model before estimating the analytical model of interest. It is somewhat unclear how severely small and potentially arbitrary decisions influence the quality of the produced findings. The current study tests this empirically using a simulation approach with datasets with low and high shares of missing data points that are missing at random (MAR). For each of the two specifications, 4,200 simulations are conducted where multiple parameters are randomly varied to create a diverse range of potential imputation models. The results demonstrate that even random imputation models can significantly reduce bias compared to listwise deletion since more than 90% of all simulations give less biased results. All minor decisions taken together explain between 11% and 21% of the variation in the statistics of interest. The most substantial influence on this variation is due to the selection of the imputation algorithm within MICE. These findings demonstrate that almost any imputation model, even those built with little care, will often result in a higher quality than applying listwise deletion, as long as critical assumptions, such as MAR and the avoidance of severe misspecifications, hold.
Computer Science and Mathematics, Probability and Statistics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.