1. Introduction
Fire is common in the Cerrado biome, mainly in open plant physiognomies predominated by herbaceous and grassland vegetation. According to Miranda et al. [
1], fires in the Cerrado, like in other savannas, can be characterized as surface fires that consume the fine fuels of the grass fuel layer. Surface fine fuels represent the layer of particles less than 0.6 cm in thickness, consisting of litter, grasses, herbs, and downed woody material [
2]. For Rothermel [
3], the fuel represents the organic matter available for ignition and combustion and characterizes the only fire-related factor that can be controlled by human action. Knowing the fuel characteristics is essential to determining fire behavior and decision-making in integrated fire management and wildfire suppression actions. However, determining the characteristics of the fuel is temporally and spatially complex [
4,
5]. Arroyo et al. [
6] reported that determining fuel characteristics demands high costs and considerable time for sampling. Roberts et al. [
7] indicate the following essential attributes in understanding fire behavior: fuel type, fuel biomass, fuel moisture, and fuel condition (live or dead). Fuel load is a crucial variable, commonly used fuel characteristic in various fire management strategies, such as fire risk assessments and fire behavior prediction [
3]. Nevertheless, there are few studies on the Cerrado that involve determining the characteristics of its flammable material.
Remote sensing techniques are essential for estimating several fuel characteristics as more studies and improvements are made. According to Roberts et al. [
7], the products obtained from various remote sensing techniques can help assess wildfire hazards, which include the following: (i) direct measurements of live fuel moisture, (ii) measurements of live herbaceous biomass, (iii) measurements of fuel condition, (iv) detailed classifications of fuel type. Van Wagtendonk and Root [
8] note that information relative to fuel is often presented as fuel model maps; fuel models are used to determine fuel load, size, depth, and moisture of extinction. A wide range of studies have estimated fuel variables through remote sensing products. However, there is a significant lack of studies concerning the surface fuel of grassland environments of the Cerrado biome. Several studies characterize and estimate forest biomass using Landsat imagery [
9,
10,
11,
12], MODIS products [
13,
14], and Lidar sensors [
15,
16]. Various studies that use remote sensing products mostly relate to the characterization of forest variables, but few works with estimates of surface fuel variables, which are more related to the occurrence of surface fires, quite recurrent in grassland and savanna areas.
The use of vegetation indices has been widely addressed. They are utilized in biomass studies and mainly to determine the fuel moisture [
17,
18]. Other studies using spectral mixture techniques have been conducted [
19,
20]. However, few are related to determining surface fuel's physical characteristics, except for Franke et al. [
21], who applied spectral mixture techniques to map fine fuel accumulation. In this sense, the main objectives of the present study are as follows:
- (1)
To evaluate the performance of multiple linear regression equations adjusted for load estimation in classes of live and dead fine fuels, considering the beginning and end of the dry season, based on the reflectance of Landsat 8 OLI images, vegetation indices, and fraction values (F-values) of the spectral mixture analysis (SMA);
- (2)
To assess the use of Random Forest and k-Nearest Neighbor algorithms to estimate fine fuel load in different classes in comparison to traditional multiple linear regression analyses;
- (3)
To analyze the importance of each predictor variable from remote sensing products in Random Forest models.
4. Discussion
Given the low correlations of the live fuel classes (live grass and live shrub), the statistical analyses for their estimation could not be conducted. However, according to the findings of Santos et al. [
70], the low moisture level for Cerrado grassland species is the main factor behind the increase in the fuel’s flammability. Moreover, the proportion of fine dead fuel about the proportion of live fuel during the dry season in the biome can reach 70%, or over 80%, considering only the grass fuel of the area, based on the fuel characterization by Santos et al. [
71]. In addition, the fine thickness of 1-h and grass fuels in a physiologically inactive state (e.g., grass and 1-h downed wood debris) makes the moisture content of these fuels more sensitive to changes in atmospheric conditions, thus affecting their ignition capacity and leaving areas with a predominance of these types of fuels more prone to the occurrence of wildfires, especially in the dry season [
72]. As reported by Slijepcevic et al. [
73] and Soares et al. [
74], low fuel moisture is one of the main factors contributing to the occurrence of large fires.
Out of the adjustments of the multiple linear regression equations for the beginning and end of the dry season in the Cerrado biome, the equations adjusted for the beginning of the dry season had the best statistical performance. This may be explained by analyzing the results obtained by Santos et al. [
71], who characterized the fuel of Cerrado grassland vegetation. In this study, fuel characteristics, such as the number of individuals, the number of species, the grass height, and the fuel load in different classes (timelag), exhibit different behaviors between the first and last months of the dry season in the physiognomy under analysis. The fuel condition mapping showing fractions of NPV, GV, and soil for the beginning of the dry season (May) in contrast to the fuel condition at the end of the dry season (September) is presented in
Figure 7. For example, given the different ages of the fuels present in the area, the load values of dead grass fuel in the first months of the dry season (May and June) showed a statistically significant difference in dead grass fuel between the ages of one and four years. By the end of the dry season (August and September), no statistically significant difference was found among the fuel loads of dead grass fuels between the ages of two and four years (p < 0.05).
Considering all the adjustments of the multiple linear regression equations in the study (beginning, end, and the entire dry period), the R² values ranged from 0.45 to 0.92. For the whole of the dry period, the highest value was 0.78 for the adjusted equation to estimate dead grass fuel (
Table 1). Similar results were reported by Tucker et al. [
75], who presented more expressive values considering only coefficient of determination (R²) values and the 0.385 µm range of the electromagnetic spectrum as the predictor variable to estimate total dry biomass (R² = 0.80). Using Landsat 5 TM imagery to estimate aboveground biomass in interior Alaska, Ji et al. [
9] obtained an R² value of 0.73 in their regression model. In one of the few studies conducted in the Cerrado, Franke et al. [
21] found relationships between variables obtained from spectral mixture analyses and the biomass of fine surface fuels in their regression model, with R² values of 0.81 (fraction of non-photosynthetic dry vegetation - FNPV) and 0.65 (fraction-soil - FS). The results of Franke et al. [
21] differ from those of the present study: between the variables given by the spectral mixture analysis and fuel load, the most substantial relationships were found for the fraction-soil (FS) values. However, the linear regression models fitted by Franke et al. [
21] did not consider a cross-validation analysis. It is possible to note that there are few studies on estimating surface fuel from satellite imagery, especially for the Cerrado biome.
Like the adjustments of the multiple linear regression equations, the models resulting from applying the Random Forest algorithm with better performance based on the evaluation metrics were for estimating dead grass fuel classes and the total fine fuel class (grass + 1-h downed wood debris). They presented higher values than the other classes, reaching an R² = 0.83. The statistical metrics obtained for estimating 1-h downed wood debris fuel load demonstrated poor performance in relation to the other variables (
Table 2). This may be explained by less spatial continuity compared to the dead grass fuel present in the study area. Dube and Mutanga [
76] used Landsat 8 OLI and 7 ETM+ images and the reflectance of the bands and vegetation indices as predictor variables. They tested the Random Forest algorithm to estimate aboveground biomass in southern Africa and found coefficient of determination (R²) values ranging from 0.43 to 0.65. Pierce et al. [
77] used the RF algorithm and information from Landsat 5 TM, field data, and topographic factors for modeling and mapping canopy fuels in California (USA) and attained pseudo-R² values ranging from 0.55 to 0.68. Frazier et al. [
11] characterized the aboveground biomass in a boreal forest using Landsat temporal segmentation metrics and found R² values of 0.62 estimated by Random Forest models. Gao et al. [
14] found higher R² values (0.75) using MODIS sensor data to estimate aboveground biomass in a region in Asia.
As for the importance of the variables of the RF models with the best fit, for the dead fuel classes (dead grass, 1-h downed wood debris, and total fine dead fuels) and total fine fuel (live and dead), the fraction-soil (FS) variable from the spectral mixture analysis (SMA) was the independent variable that exerted the most significant importance. These results corroborate the higher correlations between the FS variable and the variables obtained from the spectral mixture analysis. The physiognomy of Cerrado grassland is predominantly open (pure grassland and grassland with scattered shrubs and trees) and has a sparse population of tree species. Given the age of the fuel and its level of cover, greater or less soil exposure becomes quite noticeable in the response of the spectral mixture analysis, which demonstrated an inverse relationship with surface fuel load. Despite the importance of the FS variable in the RF models, there is no knowledge of its performance in areas with a greater presence of tree species. At least five vegetation indices, namely NDII, GVMI, DER56, NBR, and MSI, had a degree of importance concerning %IncMSE above 10%. All these vegetation indices reveal the presence of the near-infrared (NIR: 0.85 - 0.88 µm) and short-wave infrared (SWIR1: 1.57 - 1.65 µm, SWIR2: 2.11 - 2.29 µm) channels in their respective calculations. This indicates the importance of the near-infrared and short-wave infrared channels for load estimates by both the RF models and multiple linear regression analyses. The near-infrared and short-wave infrared channels have been utilized to estimate vegetation characteristics in several settings [
48,
69].
Overall, the statistical metrics used to assess the models were superior for the RF models in relation to the multiple linear regression equations and the k-NN models. Thus, the models estimating the surface fuel of Cerrado grassland by the Random Forest algorithm with data retrieved from satellite images provided better estimates of the surface fuel. Only the total fine fuel variable (live and dead) estimated by the multiple linear regression analysis resulted in a slightly higher coefficient of determination (R²), considering the entire dry period in the study area. However, the RMSE and MAE values were higher than those of the Random Forest algorithm. Aligned with the results of this study, D'Este et al. [
78] evaluated the performance of machine learning models in estimating the fine fuel load in a region of Italy. They observed greater predictive power for the Random Forest algorithm, with R² = 0.50, compared to multiple linear regression models (Multiple Linear Regression) and Support-vector Machines, which presented coefficients of determination of 0.40 and 0.39, respectively. Compared with the present study, despite different fuel characteristics, the R² values for estimating the fine dead fuel load (<0.64 cm diameter) by RF had a better performance (R² = 0.83).
The application of the k-NN algorithm to estimate the surface fuel of the plant physiognomy of Cerrado grassland was unsatisfactory, presenting lower statistical metrics than the multiple linear regression analyses and the Random Forest algorithm. This may be because the k-NN algorithm has a better performance and more comprehensive application in estimating forest variables [
79,
80,
81] and due to little knowledge in the assessment of surface fuels.
5. Conclusions
The multiple linear regression analyses showed better statistical results for equations adjusted for the beginning of the dry season (May and June) than those adjusted for the end of the dry season (August and September). This behavior arises from the various changes in fuel characteristics in Cerrado grassland's physiognomy during the dry season's beginning and end. Therefore, modeling to obtain estimates for load, moisture, and other surface fuel characteristics should be performed separately and consider the seasonality throughout the year.
The use of the Random Forest algorithm contributed to improvements in the evaluation metrics for estimating the Cerrado grassland surface fuel load concerning the multiple linear regression analyses and k-Nearest Neighbors algorithm. Out of the predictor variables originating from the products of Landsat 8 OLI images, the fraction-soil variable from the spectral mixture analysis exerted the most significant importance for load estimation in the different classes analyzed herein. Accordingly, applying the RF algorithm and the fraction-soil variable is recommended for estimating the Cerrado's fuel load of open or savanna physiognomies. However, their performance in more closed physiognomies with a greater presence of tree species, for example, is unknown. Further studies must be conducted to verify the feasibility of using the products of different satellite images in different environments, especially in the areas more prone to fires in the biome.
Author Contributions
Conceptualization, M.M.S., and A.C.B.; methodology, M.M.S.; A.C.B. and M.G.; software, E.H.R.; A.D.P.S. and J.N.C; validation, M.M.S., and A.D.P.S.; formal analysis, D.B.B.; G.R.S.; A.C.B. and M.G.; investigation, M.M.S.; A.D.P.S. and J.N.C.; resources, M.G. and A.C.B.; data curation, M.M.S., and A.D.P.S.; writing—original draft preparation, M.M.S.; writing—review and editing, M.M.S; and A.C.B.; visualization, E.H.R.; G.R.S. and D.B.B.; supervision, A.C.B., and M.G.; project administration, M.M.S., and M.G.; funding acquisition, M.G., and A.C.B. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Study area and spatial distribution of data sampling units.
Figure 1.
Study area and spatial distribution of data sampling units.
Figure 2.
Overview of fuel sampling. (A) Physiognomy of some sampling areas; (B) Fuel data sampling unit; (C) Sub-samples of surface fuel characterization; (D) Number of samples taken at the beginning (May and June) and end of the dry season (August and September).
Figure 2.
Overview of fuel sampling. (A) Physiognomy of some sampling areas; (B) Fuel data sampling unit; (C) Sub-samples of surface fuel characterization; (D) Number of samples taken at the beginning (May and June) and end of the dry season (August and September).
Figure 3.
Flowchart of the methodology applied in this study. R² = coefficient of determination, R²adj = adjusted coefficient of determination, RMSE = root-mean-square error, MAE = mean absolute error.
Figure 3.
Flowchart of the methodology applied in this study. R² = coefficient of determination, R²adj = adjusted coefficient of determination, RMSE = root-mean-square error, MAE = mean absolute error.
Figure 4.
Fine fuel load mappings performed using the best RF models: (A, B) Estimates for dead grass fuel, (C, D) Estimates for 1-h downed wood debris.
Figure 4.
Fine fuel load mappings performed using the best RF models: (A, B) Estimates for dead grass fuel, (C, D) Estimates for 1-h downed wood debris.
Figure 5.
Fine fuel load mappings were performed using the best RF models: (A, B) Estimates for total fine dead, (C, D) Estimates for total fine fuel.
Figure 5.
Fine fuel load mappings were performed using the best RF models: (A, B) Estimates for total fine dead, (C, D) Estimates for total fine fuel.
Figure 6.
Importance of the variables of the Random Forest (RF) algorithm models with the best fit for fuel load estimation. a) RF model for dead grass fuel; b) RF models for 1-h downed wood debris; c) RF models for total fine dead fuel; d) RF models for total fine fuel. %IncMSE indicates the increase in the mean squared error in percentage.
Figure 6.
Importance of the variables of the Random Forest (RF) algorithm models with the best fit for fuel load estimation. a) RF model for dead grass fuel; b) RF models for 1-h downed wood debris; c) RF models for total fine dead fuel; d) RF models for total fine fuel. %IncMSE indicates the increase in the mean squared error in percentage.
Figure 7.
Fuel condition map consisting of the three sub-pixel fraction images (R: NPV, G: GV, B: Soil) in the Serra Geral do Tocantins Ecological Station for the beginning (May 2017) and end of the dry season (September 2017).
Figure 7.
Fuel condition map consisting of the three sub-pixel fraction images (R: NPV, G: GV, B: Soil) in the Serra Geral do Tocantins Ecological Station for the beginning (May 2017) and end of the dry season (September 2017).
Table 1.
Statistical metrics of the cross-validation of the multiple linear regression models with the best fit for fuel load estimation.
Table 1.
Statistical metrics of the cross-validation of the multiple linear regression models with the best fit for fuel load estimation.
Per |
Estimated fuel (y) |
Predictor variables (x) |
R² |
R² aj |
RMSE |
MAE |
1 |
Dead grass |
NPV; Soil; B5: nir; B6: swir 1; B7: swir 2; VARI; SR; SIPI; NDII; MSR; MSI; GVMI; DER56; DER34; |
0.89 |
0.81 |
0.37 |
0.32 |
1-h downed wood debris |
NPV; Soil; GV; B4: red; B5: nir; B6: swir 1; B7: swir 2; NDVI; VARI; VIgreen; SR; SIPI; SAVI; NDWI; NRR; MSI; MNDWI; DER34; DER23; |
0.81 |
0.55 |
0.45 |
0.37 |
Total fine dead (grass + 1-h) |
NPV; Soil; GV; B4: red; B5: nir; B6: swir 1; B7: swir 2; VARI; VIgreen; SR; SIPI; SAVI; NDWI; NDII6; NBR; DER45; |
0.92 |
0.84 |
0.49 |
0.40 |
Total fine (live and dead) |
NPV; Soil; B2: blue; B4: red; B5: nir; B6: swir 1; NDVI; VARI; VIgreen; SR; SIPI; SAVI; NDWI; NDII6; NBR; MSR; MSI; MNDWI; INTEGRAL; DER45; DER34; DER23; GVMI; |
0.81 |
0.37 |
1.26 |
1.06 |
2 |
Dead grass |
NPV; Soil; B2: blue; B3: green; B5: nir; B6: swir 1; B7: swir 2; NDVI; VIgreen; SR; NDWI; NDII6; NBR; MSR; MSI; MNDWI; GVMI; EVI; DER56; DER34; DER23; |
0.81 |
0.47 |
0.44 |
0.38 |
1-h downed wood debris |
NPV; Soil; GV; B2: blue; B3: green; B4: red; B5: nir; B7: swir 2; NDVI; VARI; VIgreen; SR; SIPI; SAVI; NDWI; NDII6; NRR; MSR; MSI; INTEGRAL; GVMI; EVI; DER56; DER45; DER34; DER23; |
0.45 |
-1.59 |
1.90 |
1.53 |
Total fine dead (grass + 1-h) |
Soil; GV; B2: blue; B3: green; B4: red; B5: nir; B6: swir 1; B7: swir 2; NDVI; VARI; VIgreen; SR; SIPI; SAVI; NDWI; NDII6; NBR; MSR; MSI; INTEGRAL; GVMI; EVI; DER56; DER45; DER34; DER23 |
0.64 |
-0.72 |
2.24 |
1.98 |
Total fine (live and dead) |
Soil; GV; B2: blue; B3: green; B4: red; B5: nir; B6: swir 1; B7: swir 2; NDVI; VARI; VIgreen; SR; SAVI; NDWI; NDII6; MSR; MSI; INTEGRAL; EVI; DER56; DER45; DER34; DER23; |
0.56 |
-0.44 |
2.27 |
2.01 |
3 |
Dead grass |
Soil; B3: green; B4: red; B5: nir; B6: swir 1; VARI; VIgreen; SR; SAVI; NDWI; NDII6; MSR; INTEGRAL; GVMI; EVI; DER56; |
0.78 |
0.72 |
0.36 |
0.31 |
1-h downed wood debris |
NPV; Soil; B2: blue; B4: red; B7: swir 2; VARI; SR; SIPI; NDWI; NRR; MSI; INTEGRAL; GVMI; EVI; DER45; |
0.57 |
0.45 |
0.62 |
0.51 |
Total fine dead (grass + 1-h) |
NPV; Soil; B7: swir 2; VARI; VIgreen; SIPI; NBR; MSR; MSI; INTEGRAL; GVMI; EVI; DER45; |
0.73 |
0.66 |
0.75 |
0.62 |
Total fine (live and dead) |
NPV; Soil; GV; B2: blue; B4: red; B5: nir; B6: swir 1; B7: swir 2; VARI; SIPI; NDWI; NBR; DER56; DER34; |
0.63 |
0.54 |
1.06 |
0.91 |
Table 2.
Statistical metrics used to assess the models generated by the Random Forest algorithm.
Table 2.
Statistical metrics used to assess the models generated by the Random Forest algorithm.
Estimated fuel (y) |
Predictor variables (x) |
R² |
R² aj |
RMSE |
MAE |
Dead grass |
All predictor variables |
0.83 |
0.71 |
0.33 |
0.24 |
* Soil; B3: green; B4: red; B5: nir; B6: swir 1; VARI; VIgreen; SR; SAVI; NDWI; NDII6; MSR; INTEGRAL; GVMI; EVI; DER56; |
0.83 |
0.78 |
0.33 |
0.23 |
1-h downed wood debris |
All predictor variables |
0.59 |
0.30 |
0.58 |
0.44 |
* NPV; Soil; B2: blue; B4: red; B7: swir 2; VARI; SR; SIPI; NDWI; NRR; MSI; INTEGRAL; GVMI; EVI; DER45; |
0.52 |
0.38 |
0.61 |
0.46 |
Total fine dead (grass + 1-h) |
All predictor variables |
0.83 |
0.71 |
0.59 |
0.44 |
* NPV; Soil; B7: swir 2; VARI; VIgreen; SIPI; NBR; MSR; MSI; INTEGRAL; GVMI; EVI; DER45; |
0.79 |
0.74 |
0.63 |
0.49 |
Total fine (live and dead) |
All predictor variables |
0.62 |
0.35 |
0.89 |
0.75 |
* NPV; Soil; GV; B2: blue; B4: red; B5: nir; B6: swir 1; B7: swir 2; VARI; SIPI; NDWI; NBR; DER56; DER34; |
0.55 |
0.43 |
0.96 |
0.81 |
Table 3.
K values chosen for each k-Nearest Neighbor algorithm model.
Table 3.
K values chosen for each k-Nearest Neighbor algorithm model.
Estimated fuel (y) |
Predictor variables (x) |
RMSE value |
Chosen K value |
Dead grass |
All predictor variables |
0.4072 |
5 |
Stepwise* |
0.4453 |
7 |
1-h downed wood debris |
All predictor variables |
0.8183 |
7 |
Stepwise* |
0.8253 |
9 |
Total fine dead (grass + 1-h) |
All predictor variables |
1.1051 |
5 |
Stepwise* |
1.1233 |
9 |
Total fine (live and dead) |
All predictor variables |
1.6060 |
7 |
Stepwise* |
1.6136 |
13 |
Table 4.
Statistical metrics used to assess the models generated by the k-NN algorithm.
Table 4.
Statistical metrics used to assess the models generated by the k-NN algorithm.
Estimated fuel (y) |
Predictor variables (x)1
|
R² |
R² aj |
RMSE |
MAE |
Dead grass |
All predictor variables |
0.68 |
0.45 |
0.37 |
0.30 |
Stepwise* |
0.61 |
0.49 |
0.41 |
0.34 |
1-h downed wood debris |
All predictor variables |
0.34 |
-0.13 |
0.74 |
0.61 |
Stepwise* |
0.31 |
0.11 |
0.76 |
0.63 |
Total fine dead (grass + 1-h) |
All predictor variables |
0.54 |
0.21 |
0.92 |
0.78 |
Stepwise* |
0.49 |
0.37 |
0.98 |
0.84 |
Total fine (live and dead) |
All predictor variables |
0.38 |
-0.07 |
1.43 |
1.23 |
Stepwise* |
0.30 |
0.12 |
1.53 |
1.31 |