Version 1
: Received: 26 July 2024 / Approved: 26 July 2024 / Online: 26 July 2024 (14:52:15 CEST)
How to cite:
Chigodora, F.; Mlambo, F. F.; Hove, H. Comparison of Statistical and Machine Learning Methods for Analysing Traffic Accident Fatalities. Preprints2024, 2024072197. https://doi.org/10.20944/preprints202407.2197.v1
Chigodora, F.; Mlambo, F. F.; Hove, H. Comparison of Statistical and Machine Learning Methods for Analysing Traffic Accident Fatalities. Preprints 2024, 2024072197. https://doi.org/10.20944/preprints202407.2197.v1
Chigodora, F.; Mlambo, F. F.; Hove, H. Comparison of Statistical and Machine Learning Methods for Analysing Traffic Accident Fatalities. Preprints2024, 2024072197. https://doi.org/10.20944/preprints202407.2197.v1
APA Style
Chigodora, F., Mlambo, F. F., & Hove, H. (2024). Comparison of Statistical and Machine Learning Methods for Analysing Traffic Accident Fatalities. Preprints. https://doi.org/10.20944/preprints202407.2197.v1
Chicago/Turabian Style
Chigodora, F., Farai Fredric Mlambo and Herbert Hove. 2024 "Comparison of Statistical and Machine Learning Methods for Analysing Traffic Accident Fatalities" Preprints. https://doi.org/10.20944/preprints202407.2197.v1
Abstract
Logistic Regression and Random Forest are used to identify risk factors that influence traffic accident fatalities in the United Kingdom. The mean decrease accuracy was used to measure variable importance. The speed limit, police attendance and quarter had an increasing influence on accident fatalities. They had a mean decrease of 102.1669, 221.5322, and 120.894 respectively. The speed limit, had a parameter estimate of 0.0046902 and a standard deviation of 0.0004875. Light Conditions: Night had a parameter estimate of 1.2657635 and a standard deviation of 0.0118409. Road Type Round About had a parameter estimate of -0.4055796 and a standard deviation of 0.0210848. Police Attendance classified as Yes had a parameter of 0.8546232 and a standard deviation of 0.0151043. The best predictors were speed limit, police attendance and quarter since they had p values that were less than 0.05. The findings of the study indicated that logistic Regression had a higher accuracy rate 79.85% as compared to 64.00% for Random Forest. A split test was used and a standard deviation of 0.0010486 was obtained for the Logistic Regression model.
Keywords
traffic fatalities; logistic regression; random forest
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.