Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management

Version 1 : Received: 2 May 2024 / Approved: 6 May 2024 / Online: 7 May 2024 (17:12:23 CEST)

How to cite: Pireddu, A.; Bedini, A.; Lombardi, M.; Ciribini, A. L.; Berardi, D. A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management. Preprints 2024, 2024050322. https://doi.org/10.20944/preprints202405.0322.v1 Pireddu, A.; Bedini, A.; Lombardi, M.; Ciribini, A. L.; Berardi, D. A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management. Preprints 2024, 2024050322. https://doi.org/10.20944/preprints202405.0322.v1

Abstract

Increasingly, information technology facilitates the storage and management of data useful for risk analysis and event prediction. Studies on data extraction related to occupational health and safety are increasingly available; however, due to its variability, the construction sector warrants special attention. This review is conducted under the research programmes of the National Institute for Occupational Accident Insurance (Inail). Objectives: The research question focuses on identifying which data mining (DM) methods, among supervised, unsupervised, and others, are most appropriate to be applied to certain investigation objectives, types, and sources of data, as defined by the authors. Methods: Scopus and ProQuest were the main sources from which we extracted studies in the field of construction, published between 2014 and 2023. The eligibility criteria applied in the selection of studies, were based on the Preferred Reporting Items for Systematic Review and meta-analysis (PRISMA). For exploratory purposes, we applied hierarchical clustering, while for in-depth analysis, we use principal component analysis (PCA) and meta-analysis. Results: The search strategy based on the PRISMA eligibility criteria, provided us with 61 out of 2,234 potential articles, 202 observation, 91 methodologies, 4 survey purposes, 3 data sources, 7 data types, and 3 resource type. Cluster analysis and PCA organized the information included in the paper dataset into two dimensions and labels: "supervised methods, institutional dataset, and predictive and classificatory purposes" (correlation 0.97÷8.18E-01; p-value 7.67E-55÷1.28E-22) and the second, Dim2 "not-supervised methods; project, simulation, literature, text data; monitoring, decision-making processes; machinery and environment" (corr. 0.84÷0.47; p-value 5.79E-25÷3.59E-06). We answered the research question regarding which method, among supervised, unsupervised, or other, is most suitable for application to data in the construction industry. Conclusions: The meta-analysis provided an overall estimate of the better effectiveness of supervised methods (Odds Ratio = 0.71, Confidence Interval 0.53÷0.96) compared to not-supervised methods.

Keywords

Clustering; Principal Component Analysis (PCA); Meta-Analysis; Construction Industry; Data Mining; Machine Learning; Prediction Models; Workplaces Safety; Smart Technology (ST); State-of-the-art

Subject

Engineering, Civil Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.