Preprint Article Version 1 This version is not peer-reviewed

Precipitation Return Period Estimation using Random Forest: A Comparative Analysis with Probability Density Functions using Outdated Weather Station Data

Version 1 : Received: 31 October 2024 / Approved: 1 November 2024 / Online: 7 November 2024 (07:13:00 CET)

How to cite: Anco-Valdivia, J.; Valencia-Félix, S.; Espinoza Vigil, A. J.; Anco, G.; Booker, J.; Juarez-Quispe, J.; Rojas-Chura, E. Precipitation Return Period Estimation using Random Forest: A Comparative Analysis with Probability Density Functions using Outdated Weather Station Data. Preprints 2024, 2024110492. https://doi.org/10.20944/preprints202411.0492.v1 Anco-Valdivia, J.; Valencia-Félix, S.; Espinoza Vigil, A. J.; Anco, G.; Booker, J.; Juarez-Quispe, J.; Rojas-Chura, E. Precipitation Return Period Estimation using Random Forest: A Comparative Analysis with Probability Density Functions using Outdated Weather Station Data. Preprints 2024, 2024110492. https://doi.org/10.20944/preprints202411.0492.v1

Abstract

Precipitation during a specific return period plays an important role in the design of hydraulic infrastructure. The traditional approach involves collecting annual maximum precipitation data from a station, then subjecting it to statistical probability distributions (PDFs), and finally selecting the one with the lowest value in a goodness-of-fit test (e.g., Kolmogorov-Smirnov). Nevertheless, this methodology assumes current data, leaving uncertainty regarding its suitability for outdated data. The aim of this study is to compare the probability density functions (e.g., Normal, Log Normal, Pearson III) with the machine learning algorithm known as Random Forest (RF) for calculating precipitation at different return periods, using the province of Arequipa in Peru as the study area, through 5 stations located in different parts of the province. This comparison was conducted using the RMSE metric in both methods to evaluate their performance, resulting in RF having a lower RMSE than PDFs in most cases in calculating precipitation for return periods of 2, 5, 10, 20, 50 and 100 years for the studied stations.

Keywords

probability distributions; return period; random forest; supervised learning; annual maximum rainfall; goodness of fit test

Subject

Engineering, Civil Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.