3.1. Drought Index SPEI
Drought describes a long, dry period. It is a major factor with regards to harms on vegetation and crop yields. To quantify droughts, several indices were developed [
22]. For this work we compute the Standardised Precipitation-Evapotranspiration Index (SPEI), as it is recommended by several sources we presented in
Section 1.1 and as the necessary data is available. The SPEI is an extension of the previously and currently used Standardised Precipitation Index (SPI) [
23], which scales precipitation over a long period of time, to calculate the duration and intensity of droughts that occurred within an observed time frame. The SPEI was presented in 2010 by Vicente-Serrano et al [
2]. The idea behind it is to combine the advantages of the SPI and PDSI (Palmer Drought Severity Index [
24]), while negating their disadvantages. It is a multi-scalar index that uses precipitation and evapotranspiration as input in order to calculate the beginning, duration and intensity of drought periods.
The SPEI calculates the difference (D) between precipitation (Prec) and potential evapotranspiration (PET)
for all months i in the dataset, before standardising and scaling D. Due to the full calculation being too extensive for this paper, we refer to [
2] for the detailed description. The resulting index values range from
, which describes an extreme drought, to
which describes an extremely wet period. For each time series with values for precipitation and evaportranspiration a calibration period is selected. Data from the calibration period will be used to set the scaling parameters for the whole time series. The mean of the SPEI during the calibration period is 0 and the variance 1.
Table 1.
Classification of SPEI-values
Table 1.
Classification of SPEI-values
SPEI |
Classification |
|
extremely wet |
|
severely wet |
|
moderately wet |
|
nearly normal |
|
moderately dry |
|
severely dry |
|
extremely dry |
The SPEI is computed on the data provided by meteoblue with the Python package "climate_indices" [
25]. A variety of calibration periods were tested. In this work we used calibration periods ranging from
unless specified otherwise.
Table 2 gives an overview for the SPEI values for Saxony for the different scales. As can be seen, the mean values are negative, which means the time not included in the calibration period was generally more dry. Furthermore, the occurrence of long droughts has increased as the SPEI 6 value is the lowest. A visual representation is given in
Figure 1 in which the South-Moravian SPEI time series are shown. These plots also indicate a strong increase in frequency and severity of droughts in recent years. Additionally, they serve to illustrate that time series of an SPEI with a higher scale are much smoother compared to the SPEI 1. This also makes the prediction much easier which will be shown and discussed in
Section 4 and
Section 5.
3.2. LSTM
An artificial neural network (ANN) is a type of machine learning model inspired by the function and structure of the human brain. ANNs consist of interconnected nodes – artificial neurons – organized in layers, which process information, learn from it and transmit it. ANNs can be trained to perform various tasks such as classification and pattern recognition. Recurrent Neural Networks (RNNs) are a specific type of ANNs that can process sequential data by using feedback connections, allowing information to persist from one step to the next. Long Short-Term Memory (LSTM) [
1] is a variant of RNN that addresses the vanishing gradient problem, which can occur when training traditional RNNs. LSTMs employ memory cells and gating mechanisms that can store information over longer periods of time and selectively retain or forget information based on the input data. These mechanisms make LSTMs effective at capturing long-term dependencies in sequential data and have been successfully applied in various fields such as language processing and time series analysis [
26,
27].
Figure 2 depicts the architecture of an LSTM unit. At time step
t, the unit uses the input sequence
, the hidden state
and the memory cell internal state - which serves as the long-term memory
, both from the previous time step, to update the hidden state
and memory cell internal state
. The LSTM unit has three gates to control the flow of information by employing activation functions. The sigmoid activation function
transforms values to the range
; this helps to update data or forget it, a value of 0 means that the component is irrelevant and should be forgotten and a value of 1 indicates that the component is relevant and shall be retained. The hyperbolic tangent function
transforms values to the range
as means of regulating the network.
The
forget gate controls which information stored in the memory cell to forget.
The
input node (or cell candidate state)
, provides a proposal for new values that could be added to the cell state.
The
input gate governs which new information provided by
will be encoded into the cell state.
Next, the cell state at time
t,
is computed by employing
to determine how much information to use via
and by employing
to determine how much information to retain from the old cell internal state
, where ⊙ is the Hadamard product operadwdtor, i.e., elementwise multiplication of of vectors.
The
output gate controls which information from the current cell state should used be in the output.
Finally, the resulting values
and
are utilized to compute the hidden state at time
t,
.
Here are weight parameters of the current input sequence, , , , and are weights of the previous hidden state and are bias parameters.
The training of the LSTM is done with the Keras framework[
28], which is based on Tensorflow [
29]. The input data comprises of the SPEI time series. We experiment with different splits for the training and validation data:
Split the data by region. Training set has SPEI-values for all dates.
Split the data by time with validation data consisting of the most recent years.
Split the data by time, using the first available years as validation data.
The reason for different splits for the training and testing data is the proximity of neighboring timeseries. As the data is very similar in many cases, a random split would lead to overfitting. By setting the split manually according to the listed rules, the goal is to decrease the amount of nearly identical timeseries in different parts of the dataset. In the first case all sets contain data spanning the whole temporal range.
The 2nd and 3rd cases contain temporal splits. Theoretically, the 2nd case is more intuitive, as for any given time series as input, the value to predict comes after the input. This is also true for the 3rd case, but in contrast, the timeframe of the test set precedes the timeframe of the training set. This is an experimental setup to test whether the results differ from using the standard order.
For the final test dataset, we obtained SPEI time series for a region located East of Saxony. When training and validating a LSTM in a single region, such as Saxony, a split of approximately is used. If the validation region is different, such as South Moravia, the full data of the first region is used during the training.
The training process consists of applying LSTM to sequences of consecutive SPEI-values represented as
to predict the subsequent SPEI-value represented as
y. A training data example is presented in
Table 3 where four SPEI-values are used to predict the fifth. Depending on the scale
consist of SPEI-values for
n months, where
. The table illustrates that each input sequence is the previous input sequence shifted to the left, with the previous output value
y being used as the last input
. To determine the feasible range of input sequence lengths we conducted an experiment comparing LSTMs with sequence lengths varying between 1 and 120 values.
Due to the lack of sufficient training data we employ data augmentation by replicating each training sequence 10 times and adding a Gaussian noise with a mean value of 0 and variance of
to each element of our input sequences. Additionally, to enhance the robustness of the LSTMs, the added noise is recalculated at each iteration of the training process. Thus, during training, the example presented in
Table 3 may transform into the one shown in
Table 4.
3.3. Measuring the effect of climatic factors on crop yields
To investigate the impact of extreme meteorological events, such as drought and frost, on crop yields, we used correlation and regression methods. As described in
Section 2.2, the crop yield data is available in yearly time series, while the SPEI data is monthly. To ensure that the time series are of equal frequency, we divided the SPEI time series by month, resulting in 12 different annual time series for each SPEI scale. As the tests covered scales 1, 2, 3 and 6, we thus derived 48 time series for each coordinate. The median values were then calculated for each time series set accross all available coordinates. For instance, the SPEI 3 value for April in a given year represents the median of all SPEI values from the entire region. Correlations between the SPEI time series and the available crop yield time series were measured. Several example results are presented in
Section 4.2. Correlations were measured on the raw crop data and after applying a two-step detrending method. In the first step, a regression was calculated on the data to find the general trend in yield values. The trend was then subtracted from the yield data. This eliminates the effect of climate change on crop yields, as well as other factors that cause long-term changes in crop growth, such as technological advancements. Following this, standardization is carried out via a z-score transformation (Equation
7)).
where
X is the crop yield value,
is its expected value, and
is its variance. The transformation has the properties of
and
.
Figure 3 shows a comparison of the raw and detrended yields of wheat in Saxony.
Other climatic factors that can affect plant growth are frost occurrences and Growing Degree Days (GDD) [
30]. To examine the influence of frost, we extracted several parameters from the minimum temperatures. This includes the duration of frost-free periods, the occurrence day of the last spring frost and the occurrence day of the first winter frost for each year. No trend was found for the duration of the frost-free period in Saxony, as shown in
Figure 4. The figure shows two scatter plots for the annual number of frost-free days with a corresponding regression line for two locations in Saxony. Depending on which location - and therefore distinct time series - is selected, the trend of the frost-free duration points towards a different direction.
The GDD [
30] is a heuristic used to estimate the stages of plant growth, phenological phases, by measuring heat units. It is based on the dependence growth stage on the accumulated heat during the growing season. GDD can be calculated on a daily basis using the following formula:
where
and
are the daily maximum and minimum temperatures, respectively, and
is the base temperature. The base temperature is dependent on the plant and describes the temperature required for plant growth. GDD values are calculated so that they can never fall below 0 (e.g., in the case of
) and can take values greater than 1 for a given day. This work uses the accumulated GDD (AGGD), which is a time series for a single year with daily values representing the sum of GDD up to that day.
Figure 5 shows example plots for AGGD in Saxony given a base temperature of
for the years 1990 (
Figure 5a) and 2020 (
Figure 5b). It can be observed that some locations are generally warmer than others, which was also the case for the years in between.
A correlation analysis was conducted between yearly crop yields and any of the time series for frost-free periods, day of last spring frost, or day of first winter frost. The results showed only minor correlations at best. Additionally, there was no significant correlation found between GDDs at certain times and crop yields.
To further investigate the effects of climatic factors on crop yields, we developed a new method that combines the inputs that are insufficient on their own. We measure the synergistic effects of different factors by implementing a second-order polynomial regression to predict crop yields. The metric we use to evaluate the regression is the
score between the measured crop yields and the predictions. Due to the limited size of the data, the analysis was conducted on the full dataset without removing any entries for testing purposes. The feature size was restricted to three. After testing a variety of configurations, the resulting regression model includes values for the monthly SPEI, the number of days since the last spring frost, and the accumulated GDD at the time of the last spring frost. Other configurations may achieve similar or partially better results. The aim of this approach is not to attain the highest possible score, as this method is prone to overfitting and generating false positives. Therefore, the described features are used as the final configuration for this study. The results of this experiment are presented in
Section 4.2