Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results

Version 1 : Received: 14 June 2024 / Approved: 15 June 2024 / Online: 17 June 2024 (08:33:48 CEST)

How to cite: Bittmann, F. Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints 2024, 2024061064. https://doi.org/10.20944/preprints202406.1064.v1 Bittmann, F. Just Impute, the Details Don’t Matter?! Quantifying the Influence of (Mostly) Arbitrary Decisions on the Quality of Imputation Results. Preprints 2024, 2024061064. https://doi.org/10.20944/preprints202406.1064.v1

Abstract

Multiple imputation by chained equations (MICE) is a popular and well-researched statistical approach to account for missing data in empirical research. However, applying the technique requires researchers to set dozens of smaller and larger parameters to build an adequate imputation model before estimating the analytical model of interest. It is somewhat unclear how severely small and potentially arbitrary decisions influence the quality of the produced findings. The current study tests this empirically using a simulation approach with datasets with low and high shares of missing data points that are missing at random (MAR). For each of the two specifications, 4,200 simulations are conducted where multiple parameters are randomly varied to create a diverse range of potential imputation models. The results demonstrate that even random imputation models can significantly reduce bias compared to listwise deletion since more than 90% of all simulations give less biased results. All minor decisions taken together explain between 11% and 21% of the variation in the statistics of interest. The most substantial influence on this variation is due to the selection of the imputation algorithm within MICE. These findings demonstrate that almost any imputation model, even those built with little care, will often result in a higher quality than applying listwise deletion, as long as critical assumptions, such as MAR and the avoidance of severe misspecifications, hold.

Keywords

MICE; missing data; imputation; item-nonresponse; simulation study; dominance analysis

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.