Rainfall and Daily Average Temperature Prediction using Machine Learning: A Case Study of Bangladesh

Preprint

Article

Rainfall and Daily Average Temperature Prediction using Machine Learning: A Case Study of Bangladesh

This version is not peer-reviewed.

Adil Hussain^*,Sajib Tripura,Ayesha Aslam

This version is not peer-reviewed.

Altmetrics

Downloads

349

Views

315

Comments

Submitted:

27 February 2024

Posted:

27 February 2024

Read the latest preprint version here

A peer-reviewed article of this preprint also exists.

Abstract

Heavy rains have created significant threats to human health and life. Floods and other natural disasters, which have a global impact annually, can be attributed to extended durations of intense precipitation. Accurate rainfall predictions are crucial in nations such as Bangladesh, where agriculture is the predominant occupation. The efficiency of machine learning (ML) methods is enhanced by the non-linearity of rainfall, surpassing the effectiveness of other alternative ways. Machine learning techniques show that individual classifiers exhibit worse accuracy than ensemble learning (EL) methodologies. Ensemble Learning techniques are utilized for rainfall prediction and estimating rainfall quantity and daily average temperature to enhance comprehension of the diverse Machine Learning algorithms. This research implements the machine learning techniques and ensemble-based classifier to predict the rainfall occurrence, along with the machine learning regressor models and ensemble-based regressor for the rainfall amount prediction and daily average temperature prediction, using Bangladesh Weather Dataset. The results of the machine learning and ensemble-based models are compared using the Accuracy and F1 score for rainfall occurrence prediction. In contrast, MAE and RMSE evaluation metrics are used for ensemble regressor and regression algorithms for the rainfall amount and daily average temperature prediction. With an accuracy of 83.41% and a recall of 78.17%, the Ensemble Classifier is the best at predicting when it will rain, but its precision of 51.16% stands in for the lowest. The Ensemble Regression model outperforms Linear Regression, Random Forest, and SVR in rainfall amount prediction, with the lowest MAE of 0.36 and RMSE of 0.90. The Ensemble Regression provides the most precise results for daily average temperature prediction with the lowest MAE 0.42 and RMSE 0.54 highlighting its superiority over the other regression models in forecasting temperature with less error. Ensemble approaches consistently lead task performance metrics.

Keywords:

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Rainfall has always been important in both historical and contemporary global situations. Rainfall is influenced by various elements, including humidity, temperature, water level, and other relevant variables [1]. An excessive amount of rainfall has detrimental effects on crops within a certain geographical area, ultimately leading to the complete disruption of agricultural activities in that location. Heavy precipitation can lead to many natural disasters, such as floods, droughts, and cloud bursts, often triggered by intense rainfall and rapid landslides [2]. Rainfall prediction involves anticipating long-term precipitation patterns within a specific geographical area. The ability to accurately forecast rainfall can significantly contribute to the agricultural industry's success, fostering economic growth within a nation. Ensuring the precision of rainfall measurements is crucial for mitigating the occurrence of landslides, which frequently lead to the obstruction of river channels [3]. The variability of precipitation is a significant and intricate issue. The capability of rainfall forecasting to identify precise concealed patterns or non-linear trends in rainfall data, which are essential for achieving accurate rainfall predictions, is limited. Many rainfall forecasts proved inaccurate, resulting in substantial economic losses. The diverse climatic conditions are exerting a significant influence on the deterioration of infrastructure and the occurrence of injuries and fatalities. Hence, it is imperative to have precise rainfall forecast to anticipate the impact of weather conditions on large-scale activities [4].

Weather forecasting is a subcategory of climate change study that predicts the state of the atmosphere at a certain time and location in the future [5]. Rainfall prediction is a key use of weather forecasting in large-scale water-dependent operations, including food production planning and water resource management. To properly prepare and plan large-scale activities, rainfall projections must be improved, especially for accuracy and predictive performance. Machine learning has experienced significant advancements and has become a fundamental sub-discipline within the broader domain of artificial intelligence. Furthermore, it allows computers to acquire knowledge and understanding without explicit programming autonomously. Machine learning algorithms can be employed to derive significant insights from data, hence facilitating the efficient identification of phishing websites. Nevertheless, the current level of achievement remains significantly distant from attaining human-level ability. Human intervention is still required to predefine the algorithms during the initialization process of the machine. Various machine learning methodologies have been investigated in the context of rainfall prediction, focusing on diverse geographical regions, including South Africa, China, and other nations [6,7,8,9]. Various classifiers are employed for rainfall prediction, including Random Forest, Decision Tree, Support Vector Machine, K-nearest Neighbour (KNN), and Naïve Bayes [10,11,12,13,14,15].

Accurate trend detection and future prediction are vital. The variation in rainfall, humidity, wind speed, temperature over time, space, and aggregate can significantly impact the country's agriculture, potentially causing substantial economic setbacks [16]. Countries like Bangladesh, with diverse landscapes, face the challenge of predicting the ever-changing weather conditions involving key parameters like wind speed, humidity, temperature, and rainfall [17]. Therefore, rainfall trend detection and future prediction remain an important field of study for Bangladesh. The weather, including temperature, wind direction, speed, and amount of rainfall, can be predicted using machine learning algorithms [18]. This research work uses machine learning algorithms and implements ensemble-based regression and classifier models using the Bangladesh weather dataset to perform the weather predictions, including rainfall occurrence, rainfall amount, and daily average temperature prediction. The main contributions of this work are:

To implement an ensemble-based predictive classifier to predict whether the rain will occur on a particular day.
To implement an ensemble-based predictive regressor to predict the amount of rainfall and daily average temperature.
To evaluate the performance of the ensemble-based models with the machine learning algorithms using evaluation metrics including accuracy, precision, recall and F1 score, along with the RMSE and MAE.

2. Related Work

Intelligent weather prediction techniques can provide valuable insights, enabling us to make effective decisions that save lives, time, and property. Bosu, et al. [19] analyzed the recent changes in temperature and rainfall in different areas of Bangladesh from 1981 to 2019 using the CMIP5 dataset. In [4], Authors examined the trends and variations in Bangladesh's inter-annual, monthly, and dry-season rainfall patterns by applying ARIMA predictions and conducting Mann-Kendall and Spearman's rho tests. Mahabub and Habib [17] experienced with the raw dataset collected from the Bangladesh Meteorological Division (BMD), applied regression algorithms in Machine Learning (ML) models and achieved more precise results compared to traditional weather forecasting approaches. The regression algorithm-based machine learning model predicted more accurate results than traditional weather forecasting approaches [17]. Hashim, et al. [20] successfully predicted precipitation using wind, temperature, pressure, and relative humidity meteorological factors in the Back Propagation neural network model. Dong, et al. [21] enhanced short-term forecasting of daily precipitation using the XGBoost model combined with multi-factor bias correction for numerical weather prediction (NWP). Paul and Roy [22] developed a machine learning-based time series forecasting model to predict temperature any year in the future of Bangladesh.

In the past, individual classifiers such as Decision Trees (DT), Multilayer Perceptrons (MLP_, Naïve Bayes (NB), K-nearest neighbor (KNN), Neural Networks (NN), and Support Vector Machines (SVM) were used to create prediction models on pre-labeled datasets for rainfall prediction [11,23,24,25,26]. The individual classifier faces some constraints. For instance, when the data exhibits unstructured and intricate characteristics, together with many features, the problem might be classified as non-linear. The data has a high-dimensional nature, while the dataset is quite small. Consequently, this combination of factors can result in overfitting and a lack of interpretability while training the data. Its principal downside is the primary limitation of using an individual classifier within a prediction model. The reliability of individual classifiers is comparatively lower when compared to the reliability achieved by combining numerous classifiers. To address this limitation, numerous scholars propose that Ensemble Learning techniques offer superior classification accuracy compared to individual classifiers. Ensemble learning is a machine learning methodology utilized to enhance the accuracy of predictions [27]. Ensemble approaches are commonly considered to be the most sophisticated methodology for forecasting precipitation. These strategies enhance the predictive accuracy of a single model and aggregate their forecasts. Ensemble Learning consistently produces a robust model. Ensemble learning techniques have been successfully employed in rainfall prediction, resulting in notable improvements in prediction accuracy. This advancement in rainfall prediction can potentially mitigate the risk of substantial losses. Ensemble learning is a technique that combines the predictive capacity of numerous classifiers to provide enhanced prediction results in a given dataset [28,29]. The overview of related work is provided in Table 1.

Most of the related work is using traditional machine learning models only, for a single prediction of rainfall amount, rainfall prediction, or average temperature, and using smaller datasets. However, rainfall occurrence prediction is necessary for countries like Bangladesh to mitigate challenges related to flooding. To increase the performance accuracy of the machine learning models, ensemble-based models are proposed for rainfall occurrence prediction using five machine learning techniques in this study, along with the rainfall amount, and daily average temperature prediction using ensemble based regressor using three regression algorithms.

Absolute independence among the basic classifiers is difficult to guarantee. Ensemble learning has reduced interpretability, and the predictions and explanations of the ensemble approach are challenging to anticipate. Mastering ensemble learning is challenging, as any errors made in the rainfall prediction could potentially lead to a model that exhibits worse predictive accuracy than an individual model. Hence, it is also possible for losses to occur. Two distinct types of ensemble learning methods exist: heterogeneous ensemble learning and homogeneous ensemble learning. In ensemble methods, inhomogeneous approaches involve using identical base learners on distinct subsets of samples within a given dataset [30]. Bagging, boosting, and random forests are among the various instances that can be cited. The heterogeneous ensemble strategy involves the utilization of diverse base learners, which are constructed by either statistical methods or by aggregating the predictions of the individual base learners. Ensemble learning has emerged as a prominent approach in rainfall prediction.

3. Methodology

This research uses machine learning models to conduct predictive analyses of rainfall occurrence, rainfall amount prediction, and the daily average temperature prediction. These models are characterized by their unique methodologies, each possessing specific strengths and capabilities. Our approach entails the incorporation of both classification and regression algorithms, thereby encompassing a comprehensive array of techniques to address the intricacies associated with this multifaceted task effectively. The models used to predict rainfall occurrence are Logistic Regression, Decision Tree Classification, Random Forest Classification, K-Nearest Neighbors (KNN), and Support Vector Classifier (SVC). Furthermore, the regression-based algorithms used are Linear Regression, Random Forest Regression, and Support Vector Regression (SVR) for predicting rainfall amount and daily average temperature. To improve the results further, the research also implements an Ensemble classifier using the machine learning models for rain occurrence prediction and regression algorithms based on ensemble regressor for rain amount and daily average temperature prediction. The SVC is used for rainfall prediction, as the SVC performs binary classification, and the SVM is used for the rainfall amount and daily average temperature prediction, which performs regression. The methodology is shown in Figure 1. below.

3.1. Subsection

To achieve comprehensive predictions, our methodology also involves an ensemble classifier. The ensemble classifier operates on a principle of consensus and voting, embodying a collective wisdom that transcends the limitations of any single model. As the classifiers generate independent predictions, the ensemble classifier combines these predictions, leading to a dynamic and balanced output representing the entire ensemble's collective insight. Specifically, the ensemble classifier will predict rainfall occurrence based on historical data patterns. This aggregated result is achieved through intricate mechanisms prioritizing reliability, accuracy, and robustness. By cultivating harmony among the classifiers, the ensemble classifier fortifies its predictive ability and diminishes the influence of any potential outliers or biases present in the individual models. Rainfall occurrence prediction is ideally suited for classifier algorithms, such as Support Vector Classifiers, Logistic Regression, kNN, Decision Trees, and Random Forests, on account of their classification characteristics. This task classifies occurrences as rainfall or non-occurrence based on input features. Classifier algorithms are meant to generate discrete class labels in this type of situation. They can detect patterns and class boundaries, making them good rainfall predictors. Their efficient classification skills guarantee accurate rainfall forecasts, making them the best choice for this meteorological prediction task. The Ensemble-based classifier model used in this research work for Rainfall occurrence prediction is based on five machine-learning classifiers, as shown in Figure 2.

The Ensemble-based regressor model used for Rainfall amount and Daily Average temperature prediction is based on three machine-learning regressor algorithms, as shown in Figure 3. Due to their nature, regression techniques like Linear Regression, Random Forest, and Support Vector Regressor are good for rainfall and daily average temperature prediction. Both aim to estimate continuous numerical quantities (rainfall or temperature) from input features. For this reason, regression algorithms are designed to capture and model data patterns and correlations. For regression-based meteorological predictions of rainfall and daily temperatures, their capacity to handle continuous output variables and adapt to complex data patterns makes them the best choice.

Ensemble-based forecasting techniques have emerged as a crucial tool in improving the accuracy and reliability of weather predictions, particularly in regions characterized by dynamic and complex climatic patterns like Bangladesh. Our approach integrates multiple models and their outputs to generate a more robust and comprehensive forecast. In Bangladesh, accurate rainfall and temperature predictions are paramount, as they directly impact various sectors such as agriculture, water resource management, and disaster preparedness. This approach will explore the significance and applications of ensemble-based forecasting methods in addressing Bangladesh's unique meteorological challenges, highlighting such models' potential benefits and contributions to improving the country's resilience to changing weather patterns.

3.2. Dataset

Weather Data Bangladesh is an open-source dataset and contains 10 years of daily weather observations from many locations across Bangladesh [31]. It contains observations of weather metrics for each day from 2013 to 2022. The dataset includes columns, including Date, MinTemp, MaxTemp, WindDir9am, WindDir3pm, Windspeed9am, windspeed3pm, humidity9am, humidity3pm, pressure9am, pressure3pm, cloud9am, cloud3pm, temp9am, temp3pm, and rainToday. If today is rainy, then ‘Yes’. If today is not rain, then ‘No’.

3.3. Evaluation Metrics

The machine learning models implemented in this research are evaluated, Accuracy Score, F1-Score, Mean Absolute Error (MAE), and Mean Squared Error (MSE). The accuracy and F1 score are used for the rainfall occurrence prediction. However, the MAE and MSE are used for rainfall amounts and daily average temperature prediction.

4. Data Analysis

To analyze the data further, a comprehensive analysis is performed for various attributes, including min, max temperature, wind speed, humidity, pressure, clouds, and temperature. The dataset analysis includes the wind speed, humidity, pressure, clouds, and temperature values for only 9 AM and 3 PM.

4.1. Feature Distribution

The dataset's minimum temperature ranges from 4.3 to 27.6 degrees Celsius, with the largest frequency at 11 and 20 degrees Celsius, as illustrated in Figure 4a. However, as Figure 4b illustrates, the maximum temperature ranges from 11.7 degrees Celsius to 45.8 degrees Celsius, with the largest frequency in the dataset occurring at 25 degrees Celsius.

The wind speed ranges from 0 to 57 km/h at 9 AM, with 12 km/h having the largest frequency in the dataset, as shown in Figure 5a. Figure 5b shows that the wind speed range at 3 PM is between 0 km/h and 57 km/h, with the highest frequency occurring at 19 km/h.

The humidity ranges from 19% to 100% between 9 AM and 3 PM, with 70% humidity at 9 AM having the highest frequency in the dataset, as shown in Figure 6a. In comparison, Figure 6b shows that 47.58% of humidity at 3 PM has the highest frequency.

The wind pressure ranges from 980.5 to 1042 Hectopascal (hPa) at 9:00 AM. Figure 7a illustrates that the 1024.68 hPa pressure has the highest frequency in the dataset. Figure 7b illustrates that the pressure range at 3 PM is 988.2 hPa to 1039.6 hPa, with 1015.28 hPa of pressure having the highest frequency in the dataset.

4.2. EDA

In this section, the average speed, average humidity, and average temperature per month analysis is performed.

4.2.1. Average Speed Analysis

The monthly average wind speed data helps us understand how wind patterns change with the seasons. The results of the study show that wind speeds are usually not too high in the first few months of the year. At 9 a.m. in January, the average speed of the wind was about 15.29 km/h. In February, it picked up a bit, reaching 15.47 km/h. There was a small rise in the average wind speed at 9 a.m. in March, hitting about 15.99 km/h. Based on this view, it looks like spring has begun. As spring turns into late spring and early summer, wind speeds start to pick up. It has been seen that the average wind speed at 9 a.m. in April goes up, hitting 16.47 km/h. After that, in May, this wind speed jumped to 16.58 km/h. There is a noticeable drop in wind speed in June, with a recorded speed of 15.08 km/h at 9 a.m., suggesting a generally calm atmosphere. As the spring months give way to summer, especially in July and August, wind speeds tend to pick up even more. When it's 9 a.m. in July, the wind speed can reach 14.61 km/h. Following this, in August, the wind speed went up even more, hitting 13.65 km/h. Based on these numbers, it looks like the wind was stronger during this time. As summer turned into autumn, the wind speed picked up a lot in September, with an average of 13.82 kilometers per hour at 9 a.m. At 9 a.m. in October, the wind is usually blowing at 13.90 km/h. In the last few months of the year, especially November and December, wind speeds drop from their highest point in the summer to a very low level. The average wind speed at 9 a.m. in November is 14.92 km/h, and at the same time in December, it is 15.21 km/h, marking the end of the year. The average windspeed by month is shown in Table 2.

The average wind speed analysis is shown in Figure 8.

4.2.2. Average Speed Analysis

According to the dataset, there are clear regular patterns in the average relative humidity in Bangladesh during each season. In January, the humidity starts out pretty high, average 73.57% at 9 AM and 57.14% at 3 PM. During the winter months of February and March, the humidity drops gradually, hitting about 68.52% at 9 AM and 52.70% at 3 PM in March. In April and May, which are spring months, the humidity drops even more. In April, it's 67.29% at 9 a.m. and 52.37% at 3 p.m., and in May, it's 64.01% at 9 a.m. and 49.91% at 3 p.m. When summer comes back in June, the humidity starts to rise again. At 9 AM, it's about 66.43%, and at 3 PM, it's only 54%. August and July keep this growing trend going. In August, the average humidity is 52.33% at 3 p.m. and 65.18% at 9 a.m., and 64.16% at 9 a.m. and 52.87% at 3 p.m. The wettest month of the summer is September, when the humidity is about 65.84% at 9 a.m. and 56.10% at 3 p.m. In the autumn, however, humidity levels slowly drop. On average, they drop from about 71.20% at 9 AM to 59.14% at 3 p.m. in October and November to 70.50% at 9 AM and 58.51% at 3 p.m. December sees a return to lower humidity levels, with an average of 70.01% at 9 AM and 55.21% at 3 PM. The average humidity per month is shown in Table 3.

The average monthly humidity analysis is shown in Figure 9.

4.2.3. Average Temperature Analysis

The average temperature data analysis reveals unique seasonal patterns. The weather is warm in January and February, with an average high temperature of 22.6°C and pleasant afternoon temperatures. In March and April, the start of spring, temperatures rise to a modest level, with a maximum rise of about 21.3°C on average. When May comes around, the temperature goes up even more, reaching a usual high of 21.9°C. There are gradual rises in temperature from June to August, which is summer. July and August have the highest normal maximum temperatures, at about 22.5°C. In September and October, the temperature steadily dropped, going from 24.8°C in September to 24.4°C in October. This is because it is fall. November and December are the last two months of the year. The weather is usually mild during these months, with normal high temperatures of about 24.4°C and 24.1°C. The average temperature per month is shown in Table 4.

The month-wise average temperature analysis is shown in Figure 10.

5. Implementation

This section contains the overview of the data preprocessing process using standardizing and transforming variables, along with the Rainfall occurrence prediction, Rainfall amount prediction, and temperature prediction.

5.1. Data Processing

The dataset used in this research work is collected from Kaggle. The dataset consists of Bangladesh weather data of 10 years with daily weather observations from many locations across the country, with data for each day from 2013 to 2022.

5.1.1. Standardize the Variables

The scale of the variables is significant, as the classifier and regressor utilize the identification of the nearest test observations to predict the class and values of a given test observation. Variables with a large scale exert a considerably greater influence on the distance between observations. Standard Scaler is implemented for standardizing the variables during pre-processing. This process involves calculating the mean and standard deviation for each feature. The scaler then subtracts the mean from each feature and divides the result by the standard deviation. Further, the transform method of Standard Scaler is used for scaling transformation based on the mean and standard deviation of parameters.

5.1.2. Transforming Categorical Variables

Initially, it is important to transform categorical values into binary variables. The get_dummies() method in pandas will be used for this. Then, the categorical values within the 'RainToday' column are substituted with binary values, transforming the column from a categorical representation to a binary one. The get_dummies method is not utilized to avoid the creation of duplicate columns for the variable 'RainToday', the target variable of interest.

5.2. Rainfall Occurrence Prediction

The rainfall occurrence prediction uses machine learning techniques, including Logistic Regression, KNN, Decision Tree, Random Forest, SVC, and Ensemble-based classifier. The RainToday column was selected as the target to predict the occurrence of precipitation. With a test_size of 0.35 and a random_state of 101, the train_test_split function splits the features and Y data frames for training and testing of all the models, including the ensemble classifier.

5.3. Rainfall Amount Prediction

The rainfall amount prediction uses regression algorithms such as Logistic Regression, Random Forest, SVR, and Ensemble-based regressor. The rainfall column is the target variable to predict how much rain will fall in millimeters (mm) for a given day. With a test_size of 0.35 and a random_state of 101, the train_test_split function splits the features and Y data frames for training and testing all the regressor algorithms, including the ensemble regressor.

5.4. Daily Average Temperature Prediction

The daily average temperature prediction is also performed using regression algorithms such as logistic regression, random forest, SVR, and ensemble-based regression. The average temperature for a certain day is predicted in Celsius (^oC). The Temp9am, Temp3pm, MinTemp, and MaxTemp columns generate a new column, AvgTemp, the target variable. With a test_size of 0.35 and a random_state of 101, the train_test_split function splits the features and Y data frames for training and testing all the regressor algorithms, including the ensemble regressor.

6. Results and Discussion

This section contains the results for the rainfall occurrence prediction using the machine learning models and the ensemble classifier, along with the rainfall amount prediction and daily average temperature prediction using regression algorithms and the ensemble classifier.

6.1. Rainfall Occurrence Prediction

When comparing different classification algorithms based on their accuracy, precision, recall, and F1 score, the Ensemble Classifier shows the highest accuracy rate (83.41%). Based on this result, the Ensemble Classifier can correctly guess more cases. Still, the model's precision, which is the number of correct positive guesses, is 51.16%, which is the same as the Random Forest model's precision. The Ensemble Classifier's recall metric works very well, with a value of 78.17%, which means it is very good at finding all relevant cases. It is important to note, though, that the Decision Tree has a much lower memory rate, at 56.82%. There is a lot of consistency between all models in the F1 score, which balances precision and recall. For F1 score of 61.85%, the Ensemble Classifier performs better than others. The Logistic Regression model does a good job overall, with an F1 score of 62.03%, an accuracy of 82.36%, a precision of 54.82%, and a recall of 71.43%. Regarding precision and F1 score, the SVC works in the same way as Logistic Regression. It does, however, have worse accuracy and recall. The KNN algorithm has an impressively high F1 score of 82.53%, even though it is less accurate at 80%. This F1 score shows that KNN strikes a good balance between accuracy and recall when correctly guessing events in its environment. It is slightly more accurate than the SVC algorithm (82.97% vs. 82.97% for the Random Forest algorithm). It's important to note, though, that Random Forest has the lowest accuracy, at 51.16%. Taking all of these measurements together shows that the Ensemble Classifier has better accuracy and memory, but it's important to consider the loss of precision. Logistic Regression and SVC models, on the other hand, do better across all measures. Table 5 contains the accuracy, precision, recall, and F1 score for rainfall occurrence prediction for the machine learning models and the Ensemble classifier.

The comparison of the machine learning classifiers, along with the ensemble-based classifier using accuracy, precision, recall, and F1-score of the machine learning models the ensemble classifier is shown in Figure 11.

6.1.1. Rainfall Prediction Comparison

The results of rainfall prediction using an ensemble classifier are compared with a similar study [28], which implements ensemble-based model using multiple machine learning methods including Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF) and Neural Network (NN), and Artificial Neural Network (ANN) for the performance metrics including accuracy, precision, recall, and f-score. Table 6 shows the comparison of our model performance with the existing ensemble-based model.

6.2. Rainfall Amount Prediction

The effectiveness of various machine learning regression models can be evaluated using MAE and RMSE, two essential metrics for assessing predicted accuracy. Significant variations in the accuracy of regression algorithms can be observed when comparing their performance. Linear regression exhibits comparatively greater values of MAE and RMSE, indicating constraints on its predictive accuracy. In contrast, the Random Forest algorithm demonstrates enhanced performance by exhibiting lower MAE and RMSE values, suggesting higher accuracy in its predictive capabilities. The SVR model has a decreased MAE but a greater RMSE, indicating the possibility of enhancing predictive accuracy. The Ensemble Regression model performs better by achieving the optimal trade-off between MAE and RMSE. Consequently, it outperforms all other assessed models in terms of prediction accuracy. Table 7 contains the MAE and RMSE for rainfall amount prediction using the regression algorithms and the Ensemble classifier.

The comparison of the MAE and RMSE of the regression algorithms along with the ensemble classifier is shown in Figure 12.

6.3. Daily Average Temperature Prediction

The performance of various machine learning regression models is evaluated using two critical metrics: MAE and RMSE. Linear regression, albeit exhibiting a certain degree of predictive capability, demonstrates comparatively elevated values for MAE and RMSE. In contrast, the Random Forest model demonstrates enhanced accuracy, as evidenced by lower MAE and RMSE, suggesting more exact predictions. The efficiency of SVM in generating accurate predictions is indicated by its lower MAE and RMSE. Nevertheless, the Ensemble Regression model performs better than the other models assessed. This model demonstrates a commendable equilibrium between MAE and RMSE, resulting in the most precise predictions among the examined models. Table 8 contains the MAE and RMSE for daily average temperature prediction using the regression algorithms and the Ensemble classifier.

The comparison of MAE and RMSE of the models and the ensemble classifier is shown in Figure 13.

7. Conclusions

Forecasting rainfall involves looking at numerous variables, such as temperature, humidity, wind speed, and water level, to guess where it might rain. The most popular techniques used in rainfall forecasting are supervised machine learning techniques, which use testing data to make predictions after training predetermined example data. Finding appropriate mechanisms, balancing the sensitivity of the objective functions, and handling characteristics all present significant challenges for these systems. These variations result in variable performances, making choosing an appropriate technique for rainfall prediction difficult. This paper uses the Bangladesh Weather Dataset to implement machine learning algorithms and Ensemble-based models for the weather forecast, including rainfall occurrence prediction, rainfall amount prediction, and daily average temperature prediction. The ensemble-based model is used to improve the performance for prediction. The ensemble model used for rainfall occurrence prediction is based on a voting classifier, which uses five machine learning algorithms; however, for the rainfall amount and daily average temperature, the ensemble regressor is used by combining regression-based algorithms. The models are trained and tested using the dataset. The results show that the ensemble-based models perform better than the machine learning models for rainfall occurrence, amount, and daily average temperature prediction. The Ensemble Classifier exhibits the highest accuracy of 83.41% and recall 78.17% in predicting the occurrence of rainfall. However, its precision is tied for the lowest at 51.16%. Although the KNN achieves a lower accuracy of 80%, its high F1 score of 82.53% indicates a robust equilibrium. The Ensemble Regression model outperforms Linear Regression, Random Forest, and SVR in predicting precipitation amount, as evidenced by its lowest MAE of 0.363691 and RMSE of 0.904688. The Ensemble Regression model demonstrates its superiority over alternative regression models in daily average temperature prediction by yielding the most accurate results with MAE 0.425209 and RMSE 0.545714 as the lowest error. Ensemble methods demonstrate a consistent advantage in performance metrics across all tasks.

The main objective of this research work was the accurate prediction of rainfall occurrence and amount, along with the daily average temperature, with ensemble-based models using machine learning models. An accurate weather forecast helps mitigate the challenges of heavy rainfalls, especially in Bangladesh, which has an agriculture-based economy. In the future, ensemble-based models and other machine learning models can be applied to multiple datasets, and their performance evaluation can be. Also, the deep learning models can be applied for the predictions and compared with machine learning models, including ensemble-based models.

Author Contributions

The contributions of the authors is as follows: “Conceptualization, A. Hussain. ; methodology, A. Hussain.; software, S. Tripura.; validation, A. Hussain, and A. Aslam; formal analysis, S. Tripura; investigation, A. Hussain resources, A. Hussain and A. Aslam; data curation, A. Hussain and A. Aslam; writing—original draft preparation, A. Hussain and A. Aslam; writing—review and editing, A. Hussain and A. Aslam; visualization, A. Hussain and S. Tripura; supervision, A. Hussain; project administration, A. Hussain; funding acquisition, A. Hussain. All authors have read and agreed to the published version of the manuscript.

Funding

“none”.

Data Availability Statement

The data used in this research work is publically available on Kaggle.

Conflicts of Interest

The authors declare no conflicts of interest.

References

S. Badhiye, P. Chatur, and B. Wakode, "Temperature and humidity data analysis for future value prediction using clustering technique: an approach," International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 1, pp. 88-91, 2012.
K. Pabreja, "Clustering technique to interpret Numerical Weather Prediction output products for forecast of Cloudburst," International Journal of Computer Science and Information Technologies (IJCSIT), vol. 3, no. 1, pp. 2996-2999, 2012.
Parmar, K. Mistree, and M. Sompura, "Machine learning techniques for rainfall prediction: A review," in International conference on innovations in information embedded and communication systems, 2017, vol. 3.
S. Kundu, S. K. Biswas, D. Tripathi, R. Karmakar, S. Majumdar, and S. Mandal, "A Review on Rainfall Forecasting using Ensemble Learning Techniques," e-Prime-Advances in Electrical Engineering, Electronics and Energy, p. 100296, 2023. [CrossRef]
M. E. Mann and P. H. Gleick, "Climate change and California drought in the 21st century," Proceedings of the National Academy of Sciences, vol. 112, no. 13, pp. 3858-3859, 2015. [CrossRef]
6. C. C. Stephan, N. P. Klingaman, P. L. Vidale, A. G. Turner, M.-E. Demory, and L. Guo, "A comprehensive analysis of coherent rainfall patterns in China and potential drivers. Part I: Interannual variability," Climate Dynamics, vol. 50, pp. 4405-4424, 2018. [CrossRef]
N. A. B. Klutse, B. J. Abiodun, B. C. Hewitson, W. J. Gutowski, and M. A. Tadross, "Evaluation of two GCMs in simulating rainfall inter-annual variability over Southern Africa," Theoretical and applied climatology, vol. 123, pp. 415-436, 2016. [CrossRef]
K. Sittichok, A. G. Djibo, O. Seidou, H. M. Saley, H. Karambiri, and J. Paturel, "Statistical seasonal rainfall and streamflow forecasting for the Sirba watershed, West Africa, using sea-surface temperatures," Hydrological Sciences Journal, vol. 61, no. 5, pp. 805-815, 2016. [CrossRef]
J. Wu, J. Long, and M. Liu, "Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm," Neurocomputing, vol. 148, pp. 136-142, 2015. [CrossRef]
N. Singh, S. Chaturvedi, and S. Akhter, "Weather forecasting using machine learning algorithm," in 2019 International Conference on Signal Processing and Communication (ICSC), 2019: IEEE, pp. 171-174. [CrossRef]
S. Cramer, M. Kampouridis, A. A. Freitas, and A. K. Alexandridis, "An extensive evaluation of seven machine learning methods for rainfall prediction in weather derivatives," Expert Systems with Applications, vol. 85, pp. 169-181, 2017. [CrossRef]
N. Srinu and B. H. Bindu, "A Review on Machine Learning and Deep Learning based Rainfall Prediction Methods," in 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), 2022: IEEE, pp. 1-4. [CrossRef]
E. Dritsas, M. Trigka, and P. Mylonas, "A Multi-class Classification Approach for Weather Forecasting with Machine Learning Techniques," in 2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP), 2022: IEEE, pp. 1-5. [CrossRef]
S. Choi and E.-S. Jung, "Optimizing Numerical Weather Prediction Model Performance using Machine Learning Techniques," IEEE Access, 2023. [CrossRef]
S. Nigam, M. Gupta, A. Shrinivasan, A. V. S. Uttej, C. Kumari, and P. Disha, "Comparative Study to determine Accuracy for Weather Prediction using Machine Learning," in 2023 International Conference on Computer Communication and Informatics (ICCCI), 2023: IEEE, pp. 1-4. [CrossRef]
M. A. Rahman, L. Yunsheng, and N. Sultana, "Analysis and prediction of rainfall trends over Bangladesh using Mann–Kendall, Spearman’s rho tests and ARIMA model," Meteorology and Atmospheric Physics, vol. 129, no. 4, pp. 409-424, 2017. [CrossRef]
Mahabub and A. Habib, "An overview of weather forecasting for Bangladesh using machine learning techniques," Machine Learning, pp. 1-36, 2019.
H. Shaiba et al., "Weather Forecasting Prediction Using Ensemble Machine Learning for Big Data Applications," Computers, Materials & Continua, vol. 73, no. 2, 2022. [CrossRef]
H. Bosu, T. Rashid, A. Mannan, and J. Meandad, "Trends of Rainfall and Temperature in Bangladesh: A Comparative Analysis of CMIP5 Results and Meteorological Station Data," The Dhaka University Journal of Earth and Environmental Sciences, vol. 9, no. 2, pp. 9-18, 2020. [CrossRef]
F. Hashim, N. N. Daud, K. Ahmad, J. Adnan, and Z. Rizman, "Prediction of rainfall based on weather parameter using artificial neural network," Journal of Fundamental and Applied Sciences, vol. 9, no. 3S, pp. 493-502, 2017. [CrossRef]
J. Dong, W. Zeng, L. Wu, J. Huang, T. Gaiser, and A. K. Srivastava, "Enhancing short-term forecasting of daily precipitation using numerical weather prediction bias correcting with XGBoost in different regions of China," Engineering Applications of Artificial Intelligence, vol. 117, p. 105579, 2023. [CrossRef]
S. Paul and S. Roy, "Forecasting the Average Temperature Rise in Bangladesh: A Time Series Analysis," Journal of Engineering Science, vol. 11, no. 1, pp. 83-91, 2020.
J. Sulaiman and S. H. Wahab, "Heavy rainfall forecasting model using artificial neural network for flood prone area," in IT Convergence and Security 2017: Volume 1, 2018: Springer, pp. 68-76. [CrossRef]
B. T. Pham, D. Tien Bui, M. Dholakia, I. Prakash, and H. V. Pham, "A comparative study of least square support vector machines and multiclass alternating decision trees for spatial prediction of rainfall-induced landslides in a tropical cyclones area," Geotechnical and Geological Engineering, vol. 34, pp. 1807-1824, 2016. [CrossRef]
M. Kim, Y. Kim, H. Kim, W. Piao, and C. Kim, "Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant," Frontiers of Environmental Science & Engineering, vol. 10, pp. 299-310, 2016. [CrossRef]
S. Zainudin, D. S. Jasim, and A. A. Bakar, "Comparative analysis of data mining techniques for Malaysian rainfall prediction," Int. J. Adv. Sci. Eng. Inf. Technol, vol. 6, no. 6, pp. 1148-1153, 2016.
Sagi and L. Rokach, "Ensemble learning: A survey," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4, p. e1249, 2018. [CrossRef]
N. S. Sani, A. H. Abd Rahman, A. Adam, I. Shlash, and M. Aliff, "Ensemble learning for rainfall prediction," International Journal of Advanced Computer Science and Applications, vol. 11, no. 11, 2020.
Y. Ren, L. Zhang, and P. N. Suganthan, "Ensemble classification and regression-recent developments, applications and future directions," IEEE Computational intelligence magazine, vol. 11, no. 1, pp. 41-53, 2016. [CrossRef]
G. Kunapuli, Ensemble Methods for Machine Learning. Simon and Schuster, 2023.
"Weather Data Bangladesh." Kaggle. https://www.kaggle.com/datasets/apurboshahidshawon/weatherdatabangladesh (accessed 20 September, 2023).

Figure 1. Methodology.

Figure 2. Ensemble-based Classifier for Rainfall Occurrence Prediction.

Figure 3. Ensemble-based Regressor for Rainfall Amount and Daily Average Temperature Prediction.

Figure 4. Temperature Distribution (a). Minimum Temperature (b). Maximum Temperature.

Figure 5. Windspeed Distribution (a). Minimum Windspeed (b). Maximum Windspeed.

Figure 6. Humidity Distribution (a). Humidity at 9 AM (b). Humidity at 3 PM.

Figure 7. Pressure Distribution (a). Pressure at 9 AM (b). Pressure at 3 PM.

Figure 8. Average Windspeed per Month.

Figure 9. Average Humidity per Month.

Figure 10. Min, Max, and Average Temperature per Month.

Figure 11. Performance Comparison for Rainfall Amount Prediction.

Figure 12. RAE and RMSE Comparison for Rainfall Amount Prediction.

Figure 13. RAE and RMSE Comparison for Daily Average Temperature Prediction.

Table 1. Overview of Related work.

Refs	Models	Prediction	Limitation
[11]	Genetic Programming, Support Vector Regressor (SVR), M5 rules, M5 Model Trees, Radial Basis Neural Network	Rainfall Amount	Using traditional machine learning techniques
[17]	SVR, Linear Regression, Ridge Regression, Bayesian Ridge, Gradient Boosting, XGBoost, CatBoost, AdaBoost, KNN, Decision Tree	Windspeed, Humidity, Temperature and Rainfall amount	-Rainfall occurrence prediction is not implemented -Regression based algorithms are used only
[20]	Multi-layer Perceptron (MLP)	Raindrop prediction using temperature, pressure and humidity	-using single model only -rainfall occurrence and temperature prediction not implemented
[21]	XGBoost Model	Rainfall Amount Prediction	-Rainfall occurrence prediction is not implemented
[22]	Linear Regression, Polynomial Regression, and SVR	Daily Min, Max and average temperature prediction	-Rainfall occurrence and Rainfall amount prediction is not implemented -Traditional Techniques
[23]	Artificial Nerual Network (ANN)	Rainfall Amount Prediction	-Rainfall occurrence prediction is not implemented
[24]	Least Square Support Vector Machine (LSSVM) and Multi-class Alternating Decision Tree (MADT)	Rainfall Prediction	-only rainfall prediction -using 1 year dataset only
[26]	Naïve Bayes, Decision Tree, Random Forest	Rainfall Prediction	Small training dataset, 10 and 30% only
[28]	Ensemble based model using Random Forest, SVM, NN, NB, C4.5)	Rainfall Prediction	Low Accuracy

Table 2. Average Windspeed per month.

Month	WindSpeed9am	WindSpeed3pm
01	15.285171	17.403042
02	15.468504	18.228346
03	15.989247	18.053763
04	16.466667	19.396296
05	16.580645	18.419355
06	15.077778	18.807407
07	14.612903	20.229391
08	13.645161	20.114695
09	13.818519	21.203704
10	13.896057	21.007168
11	14.922222	19.407407
12	15.207885	19.111111

Table 3. Average Humidity per month.

Month	Humidity9am	Humidity3pm
01	73.574144	57.136882
02	72.830709	56.468504
03	68.519713	52.698925
04	67.285185	52.374074
05	64.014337	49.906810
06	66.433333	54.000000
07	65.179211	52.333333
08	64.164875	52.867384
09	65.844444	56.100000
10	71.197133	59.139785
11	70.500000	58.514815
12	70.007168	55.211470

Table 4. Average Temperature per month.

Month	MinTemp	MaxTemp	Temp9am	Temp3pm
01	14.851331	22.621673	17.158555	21.422814
02	14.154331	21.984252	16.439764	20.783858
03	12.713620	21.339068	15.451971	20.127599
04	12.600000	21.145926	15.721111	19.736667
05	12.651971	21.878495	15.939785	20.335125
06	13.621852	22.000741	16.786296	20.429259
07	14.223297	22.521505	17.465591	20.882437
08	15.870251	24.025090	19.237276	22.390323
09	17.608148	25.315926	21.029630	23.648889
10	17.709319	24.853047	20.483154	23.400000
11	16.749630	24.394074	19.634815	22.865185
12	15.739785	23.900358	18.408602	22.443011

Table 5. Performance for Rainfall Occurrence Prediction.

Models	Accuracy	Precision	Recall	F1 score
Logistic Regression	0.823581	0.548173	0.714286	0.620301
KNN	0.800000	0.368771	0.740000	0.825270
Decision Tree	0.774672	0.594684	0.568254	0.581169
SVC	0.828821	0.528239	0.746479	0.618677
Random Forest	0.829694	0.511628	0.762376	0.612326
Ensemble Classifier	0.834061	0.511628	0.781726	0.618474

Table 6. Performance Comparison.

Models	Accuracy	Precision	Recall	F1 Score
Combination of (SVM, ANN, NB, C4.5, RF)[28]	75%	53%	73%	61%
Ours	83%	51%	78%	61%

Table 7. MAE and RMSE Comparison for Rainfall Amount Prediction.

Algorithms	MAE	RMSE
Linear Regression	0.498774	0.948272
Random Forest	0.378243	0.882860
SVR	0.365070	0.971967
Ensemble Regression	0.363691	0.904688

Table 8. MAE and RMSE Comparison for Daily Average Temperature Prediction.

Algorithms	MAE	RMSE
Linear Regression	0.470631	0.603241
Random Forest	0.450968	0.570240
SVR	0.434701	0.560317
Ensemble Regression	0.425209	0.545714

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Alerts

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Rainfall and Daily Average Temperature Prediction using Machine Learning: A Case Study of Bangladesh

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. Methodology

3.1. Subsection

3.2. Dataset

3.3. Evaluation Metrics

4. Data Analysis

4.1. Feature Distribution

4.2. EDA

4.2.1. Average Speed Analysis

4.2.2. Average Speed Analysis

4.2.3. Average Temperature Analysis

5. Implementation

5.1. Data Processing

5.1.1. Standardize the Variables

5.1.2. Transforming Categorical Variables

5.2. Rainfall Occurrence Prediction

5.3. Rainfall Amount Prediction

5.4. Daily Average Temperature Prediction

6. Results and Discussion

6.1. Rainfall Occurrence Prediction

6.1.1. Rainfall Prediction Comparison

6.2. Rainfall Amount Prediction

6.3. Daily Average Temperature Prediction

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe