1. Introduction
Rice is a major agricultural commodity in Thailand and an important contributor to the country's economy. According to data from the United Nations Food and Agriculture Organization (FAO), Thailand was the world's second-largest exporter of rice in 2018, with exports valued at around
$7.4 billion [
1]. Thailand is the world's biggest producer of milled rice, producing approximately 20.3 million metric tons in 2018, equivalent to approximately 17.6 million metric tons of paddy rice [
1]. The main rice-producing regions in Thailand are the central, northeastern, and northern regions, with the central region accounting for the largest share of production [
2]. The rice sector is a significant contributor to Thailand's GDP, accounting for approximately 3.3% in 2018 [
3]. Rice farming is also a vital source of business and income for many smallholder farmers in Thailand, with the sector employing around 10 million people, or approximately 20% of the country's total workforce [
2] and can also account for a significant portion of income of smallholder farmers, with some estimates suggesting that rice can make up to 60% of their income [
3]. As a result, rice yield has become an important variable for maximizing the efficiency of rice production and fulfilling the increasing demand for rice, especially as the world's population grow. Yet many factors can affect rice productivity, including environmental factors, physical factors, and farmer quantities. Thailand has faced with the previously stated factors for many decades. Currently, land use change and climate change are the major concern to every sector, particularly developing countries. Land use change and climate change are major drivers of crop yield variations, with both expected to have significant impacts on agricultural productivity [
4]. Climate change, through warming temperatures, extreme weather events, and altered precipitation patterns, can lead to yield reductions. Land use transformation, including the conversion of agricultural to urban or industrial use, can also influence crop yields by altering the availability of land and resources for agriculture [
5].
Moreover, natural disasters such as droughts and floods can significantly affect rice yield and production by causing damage to crops, disrupting the growing season, and reducing overall yield. These hazards may result in complete crop failures or have a more limited impact, depending on the severity of the incident and the vulnerability of the disturbed region. Rice production is concentrated in some parts of the world, such as Thailand, which may be more vulnerable to natural disasters due to its position and environment. For example, seven typhoons in 2021 caused flooding in Thailand, which may wreak havoc on rice crop production in 0.85 million hectares of agricultural areas and result in farmers losing around 220 million U.S. dollars or 30% of productivity [
6].
Additionally, drought is a common occurrence in Thailand, which has a tropical climate and is prone to dry spells and water shortages. According to the report, Thailand suffered from long-term drought conditions that affected to approximately 3.8 million hectares of the whole country in 2021, and it is expected to increase and become more severe every year [
7]. Nevertheless, every factor that impacts rice production can affect directly to rice growth phases, for example, reduced leaf area index (LAI), deformation of the leaf, little growth, green to pale-colored leaves, dwarf, and lesions on the leaves.
Crop yield measurement in massive agricultural areas is difficult to verify under current circumstances such as time, budgetary, and surveyor constraints. Recently, a data-driven remote sensing approach has become efficient to measure crop conditions and predict crop yield production from a distance without being physically present in the study area. This can be done by using various sensors and platforms, including satellites, which can collect data on various aspects of the surface of the Earth, including land usage, vegetation, and weather patterns. There are many studies that utilized remote sensing data to forecast agricultural crop production [
8,
9,
10,
11,
12]. The weather factors have long been used to explain crop yield fluctuations. For instance, the study [
13] applied machine learning (ML) with land surface temperature (LST), enhance vegetation index (EVI), and normalized difference vegetation index (NDVI) from MODIS satellite and weather variables to improve soybean yield forecasts with a mean absolute error of around 0.24 to 0.42 Mg/ha. This study [
14] employed LST and air temperature to foresee corn outcomes across the US with an r-square of 0.56 to 0.65. In addition, [
15] indicated that the eXtreme Gradient Boosting (XGBoost) Machine Learning (ML) method exhibited the best metrics, which can reduce the predicting errors of cereal yield by combining remote sensing data and weather data in Morocco. Besides, several studies used drought and health indices that were obtained from computed indices of remotely sensed data, such as the vegetation health index [
16], the temperature condition index (TCI) [
17], and the vegetation condition index (VCI) [
18,
19]. It outperforms the use of health and drought indicators to predict crop production when combined with machine learning technologies [
15].
Accurate and up-to-date prediction of crop yields is essential for sustainable food security and agriculture because it helps farmers by providing decision support systems about planting and harvesting and enables policymakers to plan for and address potential food shortages. The conventional regression approaches have been overcome by ML and deep learning to provide precise and accurate statistical predictions [
20,
21]. Several studies have recently observed the statistical metrics of ML algorithms, for instance, support vector regression (SVR) [
22], random forest (RF) regression [
23], and XGBoost regression [
24], to predict crop production at local (i.e.,province) scales. This study [
8] investigated eight different ML classifiers and regressors to forecast the outcome of wheat in the winter season for China. The result indicated that SVR, RF, and gaussian process regression (GPR) denote the top three of the greatest methods for prediction, amongst others with r-square > 0.75. ML approaches are popular and outperform results when applied to crop yield prediction in many aspects, but there is evidence that the multivariate ordinary least squares approach can provide a lower error rate of soybean yield prediction than RF and long-short-term memory (LSTM) [
13]. Then, linear regression and ML regression have been compared [
25]. Moreover, hyperparameter tuning of ML models is complicated to adjust, so Gridsearch Cross Validation (CV) has been developed [
26] to apply to crop yield prediction. However, a number of studies have attempted to forecast agricultural yield at the regional level using remote sensing data without taking meteorological information into account. These are the primary elements that have a significant impact on crop yield. For instance, this study [
27] found that the root mean square error (RMSE), which is based on remote sensing data, ranged from 14% to 49%. This study also [
28] illustrated how remote sensing data could be used to predict wheat yields in Australia. According to the findings, the RMSE varies depending on the research locations and is between 0.07 and 0.25 t/ha
-1. It is unclear if using solely remote sensing data or combining it with climatic data can produce accurate results especially tropical areas. As a result, the goal of this work is to demonstrate and offer not just input datasets but also model methods that can minimize crop production forecast error.
The objective of this study was to test the capability of MLR models and machine learning (RF, XGBoost, and SVR) to predict crop yields. The model uses several variables, including various indices derived from satellite images and climate variables. Before performing the models, the variable selection process will be carried out. The associated variables with crop yield in the Chi Basin area will be reported. The differences in the combination of predictor variables were performed. All models will be used to predict crop yields at the provincial scale. The R2 and RMSE of all models were analyzed. The trends and comparison with the testing data will be performed. The selected model will be used to predict crop yields in future years.
4. Discussion
Monitoring, mapping, and predicting crop production in large regions can help farmers and policymakers make the best decisions for sustainable management, particularly in the Chi basin region, which is a major producer of crops in Thialand. This is especially important at present, as natural hazards often impact tropical monsoon areas. Additionally, climate change is one of the most important problems for the agricultural sector in the global region. Crop yield is crucial for global food security, so it is important to monitor and provide information about threats to crop production. Exact and well-timed early estimation of crop production has potential for trade and proper food management. There are various approaches to estimating the crop yield [
49,
50,
51]. Predictive models for crop yield have been developed using remote sensing data and ML methods [
52,
53]. However, these approaches may not always provide accurate results. The study [
10] applied the NDVI to forecast crop production in the Canadian Prairies, with results indicating R
2 values ranging from 0.8 to 0.9. While the study [
51] used MODIS EVI and LAI data to examine the prediction of rice crop production in Vietnam's Mekong Delta and found that the maximum correlation coefficients at the growing stage of crops were 0.70 and 0.74, respectively.
Agricultural production relies on environmental conditions, such as climatic data (rainfall, temperature, humidity, and solar radiation) [
54], so climatic and remote sensing data have been integrated for the prediction of crop yield [
55], which is consistent with the findings in this study. This study compared and evaluated various approaches and predictor variables for predicting crop yield at the provincial scale in the Chi basin, Thailand, prior to the one- to two-month harvest period. This study found that combining satellite imaging data with climatic data improved the accuracy of predicting crop yield in the Chi basin. The results showed that the LST_night, NDVI, TCI, and Tmean data perform well when used with the XGBoost algorithm and can provide an R
2 value of up to 0.95. The results showed that the LST_night, NDVI, TCI, and Tmean data perform well when used with the XGBoost algorithm and can provide an R2 value of up to 0.95. This combination of data can also improve the RMSE to 0.18 ton/ha. The XGBoost algorithm, which is a non-parametric technique that uses decision trees and joins them through boosting to make predictions, is an excellent method, similar to what was found by [
15], which reported that the fusion of remote sensing-based drought, climatic, and weather indicators improved accuracy when used with the XGBoost model for cereal yield forecasting. The temporal trend of crop yield prediction using XGBoost was rather close to the actual crop yield data; however, in 2018, the crop yield ratio differed by about 0.05 tons/ha due to natural hazards.
In 2018, there were 66 provinces or 420 districts affected by floods [
56] that destroyed several agricultural areas, especially the rice crop area, which is located in a lowland area. According to [
15], rainfed rice production is expected to decrease by around 5% from 2021 to 2029, which is inconsistent with our study, which predicts that yield will decrease by around 0.078 million tons per year starting in 2020 to 2022. In addition, drought impacts are expected to affect crop yield predictions in Thailand by about 5% mean absolute percentage error (MAPE) [
57], this can be tele-connected from El Niño southern oscillation [
58]. According to the results of total crop yield predictions for the period from 2020 to 2022 (
Table 6), crop yield predictions have fluctuated and are likely to continue to incline in the coming future due to climate conditions. However, climate change has a considerable influence on the agriculture sector, and it could lead to an increase in temperatures by 1.4 to 5.8 degrees Celsius in 2100 [
4]. This will increase crop water requirements due to increased evapotranspiration, which will mainly affect crop production [
59]. This study shows acceptable accuracy for crop yield prediction that can be used by policymakers for management at the country and province scales. Since the methodology proposed in this study can accurately forecast the crop yield, it is anticipated that this methodology can be used as a guideline for crop yield prediction in other study areas as well as for policymaking to drive the economy at the provincial or country scale. As rice is the main staple crop in Thailand and is an important source of export income for the country.
The rice crop yield in Thailand is important to the overall trade and industry performance of the whole region and contributes to the overall GDP. This can be attributed to a number of aspects, including the adoption of modern agricultural technologies such as hybrid seeds and precision agriculture, as well as improvements in irrigation and fertilization practices. In addition, Thailand has a well-developed infrastructure for agriculture, including a network of roads, ports, and storage facilities that facilitate the transportation and distribution of crops. However, despite these improvements, crop yield production in Thailand can still be affected by various factors, such as drought and extreme weather events, which can lead to fluctuations in yield from year to year. In addition, market demand and prices for crops can also impact production trends, as farmers may choose to plant crops that are more in demand or more profitable. Finally, a decrease in crop yield may lead to greater usage of pesticides, fertilizers, and other chemical inputs, which can negatively impact the environment, including pollution and degradation of natural resources. Therefore, it is important to apply the proposed approach to early crop yield prediction and take steps to maintain high crop yields and sustainable development policies in order to minimize these negative consequences.