2.1. Northeastern South America Study Area
Northeastern South America (NE South America) is taken to encompass mainland South America between the parallels 1° and 18°S and the meridians 35° and 47°W and spans a total area of 1.6 million km
2 (see
Figure 1). It is a home to around 53.1 million individuals [
29]. NE South America represents a large geographical area covering semi-arid climate with low and irregular rainfall, high temperatures, and high evaporation rates [
28]. Within the NE South America, other types of climates exist, depending on location, relief, and vegetation influences [
28].
The climate pattern of NE South America is characterized by a transition in rainfall from the dry inland (Caatinga biome;
Figure 1) to the humid Atlantic coast (Mata Atlântica biome). Mean annual rainfall increases steadily from less than 800 mm in the semi-arid interior to 1,800 mm at the coast (
Figure 2b). The semi-arid area covers 60% of the region. Within the NE South America, the wettest season typically occurs from December to April, while the dry season extends from July to October [
22]. The rainfall regime is mainly influenced by the seasonal migration of the Intertropical Convergence Zone (ITCZ), El Niño Southern Oscillation (ENSO) and the Tropical North Atlantic sea surface temperature. Severe droughts happened during El Niño events in 1983, 1998, and 2016, and due to warm surface waters in the Tropical North Atlantic between 2012 and 2018 [
8,
22].
Vegetation within NE South America was classified into four six land cover types, including both unmanaged native and managed agricultural vegetation (
Figure 1;
Figure 2a). The four most prevalent land cover types within NE South America are Caatinga, Cerrado, Atlantic Forest, and Amazon Forest, covering 52.5%, 29.4%, 10.7% and 7.4% of land area, respectively [
30]. The
Caatinga ecosystem (
Figure 1) covers 735,000 km² of NE South America and is characterized by a mosaic of xerophytic vegetation [
22]. The term “Caatinga” (Caa = forest, tinga = white) comes from the Tupi language, and is used synonymously with Steppe Savannah as defined by Trochain [
17,
22]. Typical Caatinga is composed of woody vegetation with discontinuous canopy (three to nine meters). Most Caatinga plants are formed with a fearsome array of thorns that emerge from microphyllous foliage - lost during the periodic droughts. The ground layer is rich in bromeliads, annual herbs, and geophytes. Typical species include
Amburana cearensis,
Anadenanthera colubrina,
Aspidosperma pyrifolium,
Poincianella pyramidalis and
Cnidoscolus quercifolius [
20].
One prominent feature observed in the Caatinga land cover during a prolonged period of drier-than-normal rainfall conditions is a gradual weakening in vegetative greenness [
22]. The vulnerability of the Caatinga to periodic droughts is further exacerbated by high levels of habitat degradation [
25]. Indeed, the biome is one of the most threatened in Brazil due to widespread deforestation for farming and mineral extraction [
31]. Despite the fauna and flora of the Caatinga biome region being clearly adapted to periodic droughts, some scientists believe that they may already be operating at their physiological limits [
20]. Prolonged and frequent occurrences of droughts present significant challenges to flora and fauna [
18].
2.3. The Standardized Precipitation Index (SPI)
The SPI, proposed by McKee et al. [
36], is an index that monitors drought conditions by exclusively considering rainfall data. It is commonly used to monitor meteorological drought. In this study, the SPI was calculated based on the quarterly scale (SPI-3), a common timescale used to assess the effects of meteorological drought [
23]. The rainfall time series is initially fit to a gamma distribution, which is then transformed into a normal distribution using an equal probability transformation [
36]. Daily rain gauge totals from the INMET’s network stations in Brazil from 2004–2022 were used for the calculation of the SPI [
23]. In general, negative SPI values indicate a dry period, and positive values indicate a wet period. The SPI-3 time series were structured in one numerical matrix referred to as the SPI3 matrix, based on Barbosa et al. [
17]. The probability density function for the gamma distribution is given by the expression:
where α > 0 is the shape parameter, β > 0 is the scale parameter, and x > 0 is the total accumulated precipitation over a three-month period (called the time scale). Γ(α) represents the gamma function, which is defined by the integral [
37]:
The gamma function was evaluated either numerically or using tabulated values depending on the value of α. A maximum-likelihood estimation based on the method of L-moments was used to estimate parameters α and β [
38]. The probability density function, g(x), is then integrated with respect to x to obtain an expression for the cumulative density function, G(x), which represents the accumulated rain that has been observed for a given month and time scale:
Although negligible rainfall amounts are frequent in NE South America [
22,
39], G(x) is not defined at x = 0 [
40]; therefore, G(x) was calculated following the approach of Stagge et al. [
41]:
where np refers to the number of zero-rainfall events, n is the sample size, po is the observed likelihood of zero-rainfall events, and D(x) is the cumulative density function for observed precipitation. Finally, D(x) is transformed into a normal standardized distribution using a zero mean and unit variance, from which the SPI drought index using the 3-month time scale (SPI3) was obtained.
2.4. Statiscal Analyses
To understand the regional scale in vegetation response to rainfall extremes on the magnitude of anomalies in NDVI and SPI3, the statistical analyses over NE South America were computed using the 18-years NDVI-rainfall anomalies from 2004 to 2022. The approach was carried out in four main steps, as response to seasonal and interannual variations in hydroclimatic conditions across space. The first step involved rearranging the NDVIat and SPI3t matrices to obtain two matrices that are referred to as NDVIat and SPI3t, respectively. During the second step, a Principal Component Analysis (PCA) was applied to the NDVIat and SPI3t matrices [
23]. PCA was performed by computing the eigenvectors of the covariance matrix, while the varimax method was used to provide an orthogonal rotation.
It is important to note that the original PCs are associated to an arbitrary coordinate system. The rotation procedure changes the PCs to another coordinate system that yields a better separation of the PCs in the spatial context [
23]. Unlike other orthogonal rotations, the rotation maximizes the sum of the variances of the squared loadings (squared correlations between variables and PCs) [
42]. The number of PCs retained was based on screen plots for NDVIa and SPI3, which show the variances for each against the number of PCs (criterion known as the Kaiser’s rule).
The principal component scores for the retained PCs were computed and resulted in two matrices, which are referred to here as NDVIat and SPI3t scores, respectively. Each score matrix has a dimension equal to rows × n PCs, where n is the number of retained PCs. The scores matrices were scaled by column during the third step of this approach by the mean and the standard deviation.
A Canonical Analysis (CA) was then performed on the NDVIat and SPI3t scores matrices. The
k groups used is based on two criteria: a) minimization of the sum of squares of distances between each grid cell and the assigned cluster center, and b) verifying that the linear correlation coefficient between the centers of clusters is less than 0.36 [
43]; if the algorithm did not converge based on this criterion, then the first relative minimum that was found was selected [
44]. The algorithm of Hartigan and Wong [
45] was used to perform the iterations, and the Euclidean distance was used as the distance measure.
The fourth and final step involved combining the NDVIat scores matrix (216 ×
n retained PCs) with the NDVIa clusters defined by the CA (represented by a row × 1 multinomial vector with), which result in a matrix that is referred to as NDVIac [column × (
n retained PCs + 1)]. A discriminant model that considers the multinomial vector as a factor specifying the class for each observation and the scores of retained PCs as discriminators was then fit [
46]. The moment method has been used to standardize estimators of the mean and variance. The clusters with different classifications were re-coded according to discrimination based on linear discriminant analysis (LDA). The LDA was then applied to the SPI3 score matrix as well. The final products of this approach were two multinomial vectors containing the classifications for NDVIa and SPI3 variables at monthly scales.
Overall, eight PCs retained 46 percent of the variance for NDVIa, while eight PCs retained 80 percent of the variance for SPI3. The clustering based on the k-means method identified eight similar groups for NDVIa and SPI3 variables. The LDA-based classifier improved the CA-based discrimination; e.g., the NDVIa variable has been structured in the groups N1, N2, N3, N4, N5, N6, N7 and N8, which contain 20, 55, 36, 40, 30, 15, 82 and 104 members, respectively, while the SPI3 variable has been reduced to the groups S1, S2, S3, S4, S5, S6, S7 and S8, which contain 44, 62, 56, 32, 49, 57, 49 and 33 members, respectively.
Figure 4 shows a flowchart that illustrates the procedure described above.
It should be noted that the order of the identified clusters (N1 to N8) is not related to the spatial average of the NDVIa or SPI3 in NE South America; the order has been defined randomly by the applied algorithms used during the discrimination process while only considering the similarity between members as a criterion for clustering. Therefore, these clusters have been reclassified based on the median calculated from spatially averaged SPI3 and NDVIa values over the NE South America (
Figure 5 and
Figure 6) and will be referred to as patterns throughout the remainder of this study. The median was chosen for this purpose as it allows splitting the upper half of the NDVIa or SPI3 time series averaged over the entire NE South America area from the lower half.
This approach has linked the clusters N1, N2, N3, N4, N5, N6, N7 and N8 with the NDVIa patterns F5, F4, F7, F1, F6, F8, F2 and F3; and the clusters S1, S2, S3, S4, S5, S6, S7 and S8 with the SPI3 patterns F2, F6, F1, F8, F4, F7, F3 and F5. It is important to note that this new order provides information concerning the dominant process that is taking place, e.g., regarding the NDVIa patterns, F1 indicates the occurrence of a widespread greening over the entire NE South America (highest median), while F8 indicates the occurrence of a widespread browning in the entire study area (lowest median). Similarly, for the SPI3 patterns, F1 indicates the occurrence of widespread wet conditions over the NE South America (highest median), while F8 indicates the occurrence of widespread drought conditions over the NE South America (lowest median).