Version 1
: Received: 17 May 2021 / Approved: 17 May 2021 / Online: 17 May 2021 (14:35:18 CEST)
How to cite:
Hounmenou, C. G.; Behingan, B. M.; Chrysostome, C.; Gneyou, K. E.; Glele Kakaï, R. L. Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints2021, 2021050390. https://doi.org/10.20944/preprints202105.0390.v1
Hounmenou, C. G.; Behingan, B. M.; Chrysostome, C.; Gneyou, K. E.; Glele Kakaï, R. L. Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints 2021, 2021050390. https://doi.org/10.20944/preprints202105.0390.v1
Hounmenou, C. G.; Behingan, B. M.; Chrysostome, C.; Gneyou, K. E.; Glele Kakaï, R. L. Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints2021, 2021050390. https://doi.org/10.20944/preprints202105.0390.v1
APA Style
Hounmenou, C. G., Behingan, B. M., Chrysostome, C., Gneyou, K. E., & Glele Kakaï, R. L. (2021). Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints. https://doi.org/10.20944/preprints202105.0390.v1
Chicago/Turabian Style
Hounmenou, C. G., Kossi Essona Gneyou and Romain Lucas Glele Kakaï. 2021 "Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression" Preprints. https://doi.org/10.20944/preprints202105.0390.v1
Abstract
Missing observations constitute one of the most important issues in data analysis in applied research studies. The magnitude and their structure impact parameters estimation in the modeling with important consequences for decision-making. This study aims to evaluate the efficiency of imputation methods combined with the backpropagation algorithm in a nonlinear regression context. The evaluation is conducted through a simulation study including sample sizes (50, 100, 200, 300 and 400) with different missing data rates (10, 20, 30 40 and 50%) and three missingness mechanisms (MCAR, MAR and MNAR). Four imputation methods (Last Observation Carried Forward, Random Forest, Amelia and MICE) were used to impute datasets before making prediction with backpropagation. 3-MLP model was used by varying the activation functions (Logistic-Linear, Logistic-Exponential, TanH-Linear and TanH-Exponentiel), the number of nodes in the hidden layer (3 - 15) and the learning rate (20 - 70%). Analysis of the performance criteria (R2, r and RMSE) of the network revealed good performances when it is trained with TanH-Linear functions, 11 nodes in the hidden layer and a learning rate of 50%. MICE and Random Forest were the most appropriate for data imputation. These methods can support up to 50% of missing rate with an optimal sample size of 200.
Computer Science and Mathematics, Algebra and Number Theory
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.