1. Introduction
Tropical glaciers are sensitive indicators of climate change [
1] and vital dry season sources for drinking water, agriculture, and livelihood of many people that depends upon them [
2].
Tropical Andean glaciers are among the fastest shrinking and largest contributors to sea level rise on Earth. Over the period from 1975 to 2020 Southern Peruvian Andes have receded by ∼32 % and are now at less than half their original size [
3,
4]. Most recent studies reported a glacier recession in the Cordillera Blanca of −46 % between 1930 and 2016 [
5]. This has bought effects in spatiotemporal alterations in both quantity and quality of mountain water resources [
2,
6,
7] specially this reliance increases sharply during drought conditions [
8].
Accurate extraction of glacier areas and continuous monitoring of glacier morphology are fundamental prerequisites for glacier research, as they provide crucial information for assessing hydrologic risks and facilitating climate change mitigation efforts [
9].
For example, recent studies focusing on hydrologic modeling of glacier contribution to watersheds in the tropical Andes have utilized multi-temporal estimates of glacier area derived from remotely sensed data to describe the effect that glacier change has had and will have on water resources for this region [
10,
11].
Due to the arduous, time-consuming, and subjective nature of manually delineating glacier boundaries, numerous techniques have been devised to automatically delineate glacier outlines using primarily multispectral imagery [
1], this technique has been proven to generate equivalent accuracy compared to manual digitization techniques when a large sample of glaciers is analyzed [
12]. Several studies have been carried out to map glaciated areas on a local and regional scale in Peru using remote sensing techniques [
1,
6,
11,
13] as well as in continental scale [
9].
The prevailing and currently operational technique employed for delineating debris-free glaciers in the Peruvian Andes relies on the utilization of a Normalized Difference Snow Index (NDSI) threshold [
12,
14]. However, it is important to note that different threshold values can yield varying results [
15]. Although the threshold limit may need to be adjusted for specific regions [
10,
13], it is commonly kept constant at approximately 0.35 to 0.55 in the literature [
1].
Machine learning (ML) is commonly used to build geospatial prediction models [
16,
17,
18,
19,
20] and has been universally employed in geoscientific research such as global soil properties mapping [
21,
22,
23], landslide susceptibility [
24], climate change studies [
25], and wildfire risk mapping. Specially several machine learning approaches have been suggested for glacier mapping studies [
26,
27,
28,
29]. To be sure that observed glacier changes are related to real changes rather than caused by imprecise determination of the outline, the accuracy of the outlines must be known [
30]. This measure is also appropriate to assess the significance of any derived relative changes in glacier size.
Geospatial prediction of environmental variables often relies on map accuracy assessment [
31,
32], classical map accuracy assessment is rooted in sampling theory [
33,
34] in which an unbiased estimate of map accuracy is obtained.
Several approaches can be chosen in order to accomplish accuracy assessment [
35]. A classical procedure is to evaluate model performance and associated errors by randomly selecting a number of test observations that are set aside at the model calibration stage and only used to quantify model prediction error [
36]. While the best method is simply using a completely independent test sample, this is not always feasible [
33,
34].
A possible solution is K-fold cross validation (K-CV)[
37,
38,
39]. K-CV is a resampling-based technique for the estimation of a model's predictive performance [
38]. The basic idea behind K-CV is to split an existing dataset into training and test sets using a user-defined number of partitions. First, the dataset is divided into k partitions or folds. The training set consists of k − 1 partitions and the test set of the remaining partition. The model is trained on the training set and evaluated on the test partition. A repetition consists of k iterations for which every time a model is trained on the training set and evaluated on the test set.
Each partition serves as a test set once. The procedure is repeated k-times and finishes when each unique subset has been used for testing once. A further improvement of this approach is the repetition of the K-CV: the whole procedure is repeated, producing a set of random samples each time. This approach is called the repeated k-fold CV method (RK-CV).
In environmental sciences, observations are often spatially dependent [
24,
40]. Subsequently, they are affected by underlying spatial autocorrelation by a varying magnitude. Although cross-validation is a particularly well-established approach for model assessment of supervised machine-learning models [
39,
41,
42,
43] it is generally agreed that most ML methods in spatial applications do not consider relative location and neighborhood features and that they analyze pixels regardless of their surroundings [
44], and hence completely ignoring the spatial dependent and heterogeneity of spatial processes [
16]. Therefore, the direct application of ML to geospatial data without accounting for the potential spatial autocorrelation could lead to biased outcomes. It is clear from many studies [
40,
44,
45] that unaddressed spatial autocorrelation generates problems, such as overoptimistic fit of models, omitted information and/or biased (suboptimal) and therefore model performance estimates from conventional cross-validation leads to optimistically biased estimates of map accuracy [
20,
36,
43] due to the similarity of training and test data in a non-spatial partitioning setup when using any kind of cross-validation for tuning or validation [
36,
43,
46].
Several papers including [
36,
47] have investigated the over-optimistic accuracy estimates by promoting SP-CV methods. These methods start from the premise that spatial proximity of data points in the calibration and test data folds is to be avoided. This is commonly achieved by spatial blocking in K-CV [
39,
47]. The general overview from the literature [
44] is that visible progress has been made recently in the development of spatial machine learning modelling. The current standard for modeling is to use a classic ML algorithm, and some sort of spatial cross-validation method (SP-CV). Therefore, cross-validation approaches that adapt to this problem should be used in any kind of performance evaluation when spatial data is involved [
48,
49].
A common approach is block-CV (BLK-CV) which divides all samples into contiguous blocks, and then avoids the selection of samples within the same block as both training and validation samples [
47]. As an example [
36,
43] used K-Means clustering to split samples into five folds based on sample data’s locations. The validity of these procedure was empirically tested by [
36,
40,
43]. They found that BLK-CV better approximate the error obtained when predicting species distribution and above ground forest biomass (AGB).
Overall, the intention of this work is to demonstrate the need for spatial cross validation when using machine learning in geospatial classification of glacier land cover when the aim is to estimate a biased-reduced predictive performance. The following objectives (and hypotheses) are addressed:
The study compares the predictive performance of machine learning algorithms: k-nearest neighbors, Random Forest, Gradient Boosting Machines, and classical statistic models: logistic regression in glacier mapping. It is expected that the machine learning algorithms will demonstrate significantly higher predictive performance.
The study also examines the predictive performance of classification algorithms when spatial and non-spatial cross-validation methods are used. It is hypothesized that non-spatial partitioning methods will yield over-optimistic results due to the presence of spatial autocorrelation.
Additionally, the study investigates the effects of spatial clustering on the distribution of errors in the analyzed algorithms for glacier mapping. It is anticipated that non-spatial models will exhibit spatial autocorrelation in their errors.
3. Results and Discussion
3.1. Spatial Autocorrelation
Semivariograms were calculated for the glacier class indicator variable for each glacier area under study to confirm the presence of spatial autocorrelation in the datasets.
Figure 4 shows the experimental and fitted indicator variograms for each glacier data set for 1000 test pixels, the semivariograms were rescaled by variance of each indicator variable to facilitate comparison across datasets. The parameters of the indicator variogram models of the 9 data sets are given in
Table 3.
In general, it is clear that the glacier class sampled at 30 m resolution presents a significant spatial correlation, the smallest range of spatial dependence was found for cordilleras Central, Urubamba and Huallanca with ranges from 0.37, 0.68 and 0.87 Km respectively, and the largest range was found for cordillera Blanca (5.42 Km).
For some glacier regions the indicator semivariograms display a very large nugget effect and a short-range structure (i.e., Central, Huaytapallana and Urubamba), as well as bigger kappa parameters, indicating a fairly constant spatial process in those cases. But in general, relative larger ranges and smaller nugget effects occur for the majority of other regions. For instance, in the cordillera Blanca case, the indicator variogram shows that at a 30 m resolution km spatial resolution, glacier class presents a significant spatial correlation up to 5.42 km (
Figure 4). This spatial autocorrelation can notably be observed in almost all cases, where clusters of homogeneous glacier class values are present. The cordillera Blanca region exhibited a larger range due to its considerably larger area (13,963 Km2). The absence of similar range behavior in other areas can be attributed to the substantial differences in the overall size and extent of each individual region because glaciers assume a size and flow rate that are in balance with local climate [
15].
Given the relatively high sampling intensity (and resulting proximity) of glacier class pixels selected for this work, and the long range of spatial autocorrelation in the data (i.e., 5 km), it is obvious that any given randomly selected test pixel will not be independent from a large number of neighboring pixels, thereby violating the core hypothesis of model validation (i.e., the independence between training and test sets). This result probably doesn’t hold for the regions for which variograms revealed poor spatial structure like Central, Huaytapallana and Urubamba Cordilleras. This suggest that a spatial cross validation method could possibly be useful in this context [
36].
It is worth to note that no stationarity assumptions are needed since the indicator variograms are simply used as a means to describe the spatial structure of the data, rather than to perform a model based inferential spatial prediction of a spatial process.
3.2. Model Specification
For the benchmark, all models were trained using either default hyperparameter values and/or recommendations from the literature that were specifically tailored to our case data. A detailed summary of the models and hyperparameters is shown in the
Table 4.
3.3. Spatial Vs. Non-Spatial CV Comparison Experiment
Figure 5 shows the results of our experimental benchmark. It is possible to identify the models that exhibit superior and inferior performance in both scenarios. In NSP-CV settings, K-nearest neighbors (KNN) was the overall best model across regions consistently demonstrating the highest performance followed by logistic regression (LR) and random forest (RF). The SP-CV settings, generally shown lower MCC values, K-nearest neighbors (KNN) remained as the overall best model in almost all regions, followed by gradient boosting machines (GBM) and logistic regression (LR), in Cordillera Blanca and Central respectively. Some studies have found that KNN shows the best predictive performance in spatial settings [
81,
99] although this is generally not the case [
43,
100,
101,
102].
Although SP-CV generally shows lower MCC results than NSP-CV, this is not always the case, especially in Cordillera Central, Raura, Urubamba, and Vilcanota. We hypothesize that these results are connected with the degree of spatial autocorrelation/clustering of the glacier class in these specific Cordilleras. For example, Cordilleras Central and Urubamba present almost a pure nugget effect, which could have generated the mixed response of MCC to both validation approaches. Hence, it is impossible to distinguish the effect of SP-CV from NSP-CV in the presence of weak spatial autocorrelation.
Surprisingly logistic regression (LR) demonstrates good performance in some cases in this quasi-linear problem, without clear evidence of being surpassed by models such as random forest (RF) and boosted trees (GBM) in some cases, as commonly acknowledged in other geospatial contexts [
17,
43,
74]. This result shows the importance of traditional parametric approaches in spatial modeling which make this algorithm a valid choice, especially if the differences in predictive accuracy compared to black-box models are small
Concerning the poor performance of random forest (RF), the literature generally agrees upon its general applicability in geospatial contexts as a "go-to" model [
17,
43,
60,
74,
103]. RF uses "bagging" (bootstrap aggregation). As spatial data is correlated, this resampling violates the assumption of independent and identically distributed (i.i.d.) data units, which is fundamental to bootstrapping. [
104] provided evidence that these limitations could lead to inferior prediction performance of RF under spatial dependence, and this could be the reason for the observed performance of RF.
There were notable differences in the distribution of the estimated MCC between the spatial and nonspatial settings (i.e.,
and
).
Figure 6 shows the results for both CV methods for each glacier region under study along the reference error
in red doted horizontal lines.
First and foremost, it can be seen than almost in all cases the SP-CV and NSP-CV methods produced quite different results in terms of MCC, it is also remarkable that the proposed SP-CV, which consider the clustered nature of the data, produce closer evaluation results to the reference prediction MCC than NPS-CV in almost all cases. This suggests that SP-CV may produce bias reduced spatial predictions when using ML models for glacier mapping, especially when training locations area scare and highly clustered as is usual in glacier monitoring and mapping. As expected, the SP-CV results shown high variance for all models.
Upon careful examination of
Figure 6, Cordillera Central, Urubamba, and Vilcabamba, it can be found quite high differences between CV-MCCs and the test MCC (dotted red lines in
Figure 6), regardless of the validation approach. It seems that, in those particular cases, both CV methods are incapable of estimating the true error of the models. This indicates the presence of biased results, even when employing the spatial cross-validation approach suggested in this study. Although there could be multiple reasons to explain these results, we hypothesize that they are due to possible overfitting of the models in those specific areas.
A more detailed analysis can be found in
Table 5 that shows the mean MCC grouped by Cordillera and CV method as well the difference regard to the test reference MCC (in parenthesis). For example, Cordillera Blanca mean MCC for NSP-CV models is 0.949, while mean MCC for SP-CV models is 0.928, and their difference with respect to the reference independent test (i.e.,
0.844) are 0.1054 and 0.0845 respectively.
Overall, the spatial 5- fold CV led to a slightly decline in model’s MCC (i.e., about - 4%) respect to the non-spatial k fold method. Likewise, the SP-CV yields a sharp decline in biases ). compared to the reference MCC evaluated at the test-sets (i.e., about - 18%).
Furthermore, SP-CV’s evaluation results were much closer to the reference prediction error than the results of NSP-CV for all glacier regions under study. This shows that the proposed method, which considers the spatial structure of the data in the evaluation of the model could indeed provide a reasonable unbiased result. To further strengthen this observation, it is important to analyze the individual model’s discrepancy between the SP-CV and NSP-CV MCC estimates obtained in the benchmark’s settings, which will be explored in the subsequent section.
3.4. Satistical Comparison of Model Results
We carried out a comparative statistical analysis of the results using non-parametric statistical tests to compare the performances of all the considered models.
Table 6 shows the t-statistics and the p-values (in parenthesis) of paired t-test comparison for the MCC results obtained after the 50-repeteated 5-fold CV using the two types of cross-validation for each glacier region (p-values < 0.05 means that there is a significance statistical difference between cross-validated MCC results and p-value > 0.05 means that both cross-validation approaches yield equal results).
Based on the statistical analysis, considering the results of the paired t-test, it is not definitively conclusive to assert that spatial cross-validation (SP-CV) yields significantly different Matthew’s correlation coefficient (MCC) estimates compared to non-spatial cross-validation (NPS-CV) in the majority of cases with exception of remarkable cases that need further clarification:
For Cordillera Blanca only GBM and KNN models showed significant differences (p-value < 0.05). For Cordillera Huallanca, Huayhuash and Vilcabamba RF was the only model that showed significant differences. For Cordillera Huaytapallana only GBM showed significant differences. For Cordillera Urubamba only LR showed significant differences.
Some algorithms showed no sensitivity to the cross-validation method (i.e., KNN and LR). Since KNN is the best model overall in all cases regardless the CV method, this suggests that KNN can produce more reliably estimates of error, and at the same time the best performing predictions. On the other hand, some algorithms were quite sensible to the cross-validation method. RF showed, especially RF showed significant differences between SP-CV MCC’s and NP-CV MCC’s. In all those cases, the null hypothesis establishes that average differences between error’s models (i.e., MCC) are non-negligible. Therefore, this difference can be attributed to an overoptimistic bias in nonspatial performance estimates in the presence of spatial autocorrelation [
36,
43,
46,
72,
74].
However, it should be noted that these results may slightly vary depending on the data distribution within each fold and the randomization in each repetition, which can impact the comparison tests [
43]. Nonetheless, given the considerable number of repetitions in the experiments, these results are generally robust.
While our results align with previous studies, such as those by [
43,
105] and [
45] suggesting that non-spatial performance estimates tend to be significantly “superior´’ to spatial performance estimates, it is important to note that our experiments do not allow for such a definitive statement in all cases. The observed differences in performance between spatial and non-spatial estimates in our study may not be as pronounced or statistically significant regardless of the classification algorithm.
3.5. Spatial Autocorrelation of Errors
Is traditionally acknowledge that when applying standard ML models in a spatial prediction setting, errors could remain depended in space due to failing to account for spatial dependency or heterogeneity in important explanatory variables or in model architecture [
72,
106,
107]
Correspondences between predicted and label class data were indicator coded. If the class of the test sampled pixel matched with that of the prediction class, an indicator code 1 was assigned to that sample pixel. Conversely, an indicator code 0 was given to sites where the predicted class differed from the test class. These values were identified as classification errors and obtained through the 50 repeated spatial 5-fold either spatial and non-spatial cross-validation (i.e., SP-CV and NSP-CV) and then aggregated through a simple majority vote approach for all repetitions. Next, we analyzed spatial autocorrelation of the indicator-coded data using Moran's I.
Figure 7 shows that regardless of the type of modeling algorithm, there were low MI values with statistically non-significant p-values, (i.e., p values greater to 0.05). At first it seems that SP-CV could possibly reduce the spatial dependence of errors compared to NSP-CV which shows generally higher MI values (i.e., spatial correlated errors). Thus, suggesting that in our SP-CV validation method, spatial structure of errors is completely eliminated in all glacier regions but not in NSP-CV. But a closer inspection of p-values (not shown in figure) suggest that the Null hypothesis should not be rejected and the spatial error patterns are almost random in both cross-validation scenarios.
Urubamba was the only exception in terms of significant MI p-values, especially when KNN model was applied, indicating the occurrence of spatial correlated errors. Overall, after either SP-CV and NSP-CV methods, the errors’ spatial structure was completely absorbed by almost all the models in all the glacier regions (statistically non-significant p-values) regardless the CV approach.
These results suggest that our expectation that the inclusion of spatial information in SP-CV method is supposed to capture the spatial autocorrelation of errors better than the NSP-CV doesn’t hold for glacier classification errors in the Tropical Andes using machine learning algorithms.
3.6. Spatial Predictions
Finally, we generated glacier outline maps over all studied regions using trained models to visually inspect and compare the predictions. For predict glacier class we took the approach of [
108] described in [
45]. We used all available pixels of each area to fit the final prediction models. As the error estimates from such a model are invalid, error estimates derived from the blocked cross-validation methods should still be used. This approach favors final prediction quality over perfect accuracy of error estimates. It has the advantage of using all the data and thus likely being the best predictor, particularly for smaller datasets. It has the disadvantage that the error estimates from the cross-validation no longer apply perfectly to the predictions, as they were made with slightly different models.
To analyze the predictions in greater detail we show a set of predictions on four small areas within Cordillera Blanca (
Figure 8). Overall, the spatial distribution of the model predictions is quite similar. Here, KNN is not clearly the best predictor all around, as all the outlined maps exhibit a strong agreement with the NGI outline. There are, however, certain areas showing overestimation and underestimation of glacier surfaces. Notably, overestimation is noticeable, particularly near the glacier periphery of NGI in inset 3. In inept 2 we observe some areas of underestimation, where glaciers areas were misclassified as non-glacier, especially in the case of KNN and RF algorithms.
These classification errors could be attributed, to some extent, to the compositing and cloud filtering of Landsat 8 images in GEE. This filtering process might lead to the improper detection of transient snow areas, which are then mistaken for glaciers. Importantly, this phenomenon is not observed in other prediction areas. This highlights the significance of the cloud filtering algorithm, as snow pixels can seriously undermine the performance of machine learning models of glacier classification. Additionally, the spatial resolution of Landsat imagery used in this study can introduce a substantial number of mixed pixels—pixels that encompass both the glacier and the surrounding terrain. To address the former issue, a series of recommended algorithms has been previously suggested [
57].
The debris-free outline map generated using the NDSI approach exhibits a high degree of similarity with the NGI outlines, although not perfect, probably due to NDSI being the current operational method for glacier mapping in Perú [
11,
52]. Although high similar, differences exist because our approach uses Landsat 8 imagery, whereas the NGI uses Sentinel 2 and Landsat 8 as well.
Prior studies have already concluded that debris-free ice can be accurately mapped using simple methods, such as the band ratio of the Red/NIR bands of Landsat or S2 data [
26,
42,
53,
64,
109]. While our primary objective was to compare the predictive classification errors of machine learning algorithms using spatially distributed glacier class data, it's important to note that NDSI and band ratio approaches remain robust and widely used methods in remote sensing-based glacier monitoring. As this research demonstrates, these methods should not be dismissed. Nevertheless, evaluating the classification errors of glacier mapping using these approaches requires a proper assessment, typically involving a comparison with high-resolution satellite images [
30,
109], a procedure which is highly manual and inefficient. Therefore, we hope that a statistically sound approach could enhance current national glacier mapping efforts.
3.7. Methodological Limitations of the Study
Although machine learning (ML) has been effectively used to map glaciers in recent years [
110], for example, [
64] used approaches such as K-nearest neighbors (KNN), while [
111,
112] employed Random Forest (RF). The use of deep learning segmentation algorithms has been recently suggested for glacier mapping [
42,
110]. However, the computational cost can be high, limiting nationwide mapping efforts at the moment. Additionally, deep learning approaches have disadvantages compared to conventional machine learning algorithms, as spatial validation methods cannot be readily applied to them.
Some limitations detected in this work concern primarily the training/testing pixel sampling schema. When using multi-spectral optical satellite data for glacier mapping, an independent validation sample is usually not used, and maps are validated either manually [
5,
53,
57] or through a comparison with high-resolution satellite images [
30,
109]. On the other hand, some studies used an independent sample [
9] as a means of validation. We would argue that the size of the training and validation samples would inevitably affect the classification and validation results.
We acknowledge that hyperparameter tuning would likely yield better results or slightly alter the best model's outcomes [
107]. A few previous studies have tested the effects of hyperparameter tuning and spatial cross-validation. For example, [
43] implemented hyperparameter tuning in both spatial and non-spatial settings with mixed results. Therefore, here we fixed the hyperparameters for ease of comparison between the proposed validation methods, a similar approach that has been used by [
36].
We also hypothesize that the size of the validation blocks needs calibration, as previous results suggested [
36]. The size of the blocks should be substantially larger than the range of spatial autocorrelation in the model residuals to provide a reliable error estimate.
In general, we suggest that more work needs to be done to (1) incorporate different pixel sampling schemes and sample sizes into consideration, (2) develop experimental benchmarks that incorporate hyperparameter tuning, and (3) understand the effect of the spatial dimensions of validation blocks on the possible reduction of error estimates.
Figure 1.
The geographical location of selected study areas inthis work, Perú. Background is a JAXA’S ALOS WORLD 3D DEM and reference glacier surfces from the National Inventory of Glaciers produced on 2017. All plotting was done in QGIS 3.30.1-'s-Hertogenbosch.
Figure 1.
The geographical location of selected study areas inthis work, Perú. Background is a JAXA’S ALOS WORLD 3D DEM and reference glacier surfces from the National Inventory of Glaciers produced on 2017. All plotting was done in QGIS 3.30.1-'s-Hertogenbosch.
Figure 2.
Flowchart of the processing steps presented in this study.
Figure 2.
Flowchart of the processing steps presented in this study.
Figure 3.
The spatial CV process used in this study.
Figure 3.
The spatial CV process used in this study.
Figure 4.
Experimental (points) and fitted rescaled indicator variograms (curves) for the data set with 1000 pixels. a)Blanca, b)Central, c) Huallanca, d) Huayhuasha, e) Huaytapallana, f) Raura, g) Urubamba, h) Vilcabamba and i) Vilcanota.
Figure 4.
Experimental (points) and fitted rescaled indicator variograms (curves) for the data set with 1000 pixels. a)Blanca, b)Central, c) Huallanca, d) Huayhuasha, e) Huaytapallana, f) Raura, g) Urubamba, h) Vilcabamba and i) Vilcanota.
Figure 5.
The final results (Matthew Correlation Coefficient ) of experimental benchmarks after applying algorithm 1 on the 9 studied areas. a)Blanca, b)Central, c) Huallanca, d) Huayhuasha, e) Huaytapallana, f) Raura, g) Urubamba, h) Vilcabamba and i) Vilcanota.
Figure 5.
The final results (Matthew Correlation Coefficient ) of experimental benchmarks after applying algorithm 1 on the 9 studied areas. a)Blanca, b)Central, c) Huallanca, d) Huayhuasha, e) Huaytapallana, f) Raura, g) Urubamba, h) Vilcabamba and i) Vilcanota.
Figure 6.
The final results (Matthew Correlation Coefficient ) of experimental benchmarks desegregated by model type and study area. Reference error of each model in red doted horizontal lines are used for calculated .
Figure 6.
The final results (Matthew Correlation Coefficient ) of experimental benchmarks desegregated by model type and study area. Reference error of each model in red doted horizontal lines are used for calculated .
Figure 7.
Moran's I results of cross-validation error.
Figure 7.
Moran's I results of cross-validation error.
Figure 8.
Maps of the spatial predictions of compared models. NDSI is added as a comparison. Reference National Glacier Inventory in red contour is overlay as a reference.
Figure 8.
Maps of the spatial predictions of compared models. NDSI is added as a comparison. Reference National Glacier Inventory in red contour is overlay as a reference.
Table 1.
OLI and TIRS bands description, information. The visible part of the electromagnetic spectrum is covered by bands 1–4 and 8 with resolution of 30 m, except for bands 8 that has a resolution of 15 m. The NIR is covered by band 5 with a resolution of 30 m. The SWIR is covered by bands 6 and 7 with a resolution of 30 m.
Table 1.
OLI and TIRS bands description, information. The visible part of the electromagnetic spectrum is covered by bands 1–4 and 8 with resolution of 30 m, except for bands 8 that has a resolution of 15 m. The NIR is covered by band 5 with a resolution of 30 m. The SWIR is covered by bands 6 and 7 with a resolution of 30 m.
Name and Resolution (m) |
Spectral Range um |
Band Number |
Coastal/ Aerosol (30 m) |
0.435–0.451 |
Band 1 |
Blue (30 m) |
0.452–0.512 |
Band 2 |
Green (30 m) |
0.533–0.590 |
Band 3 |
Red (30 m) |
0.636–0.673 |
Band 4 |
Near InfraRed (NIR) (30 m) |
0.851–0.879 |
Band 5 |
Short Wave InfraRed 1 (SWIR 1) (30 m) |
1.566–1.651 |
Band 6 |
SWIR 2 (30 m) |
2.107–2.294 |
Band 7 |
Table 2.
Allocation of Training and Test Samples within Respective Cordillera Regions.
Table 2.
Allocation of Training and Test Samples within Respective Cordillera Regions.
Cordillera |
LS81 Composite Total area (Km2) |
Non glacier samples |
Glacier samples |
Test samples |
|
Cordillera Blanca |
13 963.1 |
248 |
217 |
1000 |
|
Cordillera Central |
5957.4 |
248 |
245 |
1000 |
|
Cordillera Huallanca |
271.9 |
247 |
238 |
1000 |
|
Cordillera Huayhuash |
344.5 |
246 |
250 |
1000 |
|
Cordillera Huaytapallana |
2 489.8 |
246 |
200 |
1000 |
|
Cordillera Raura |
322.3 |
247 |
245 |
1000 |
|
Cordillera Urubamba |
1818 |
248 |
184 |
1000 |
|
Cordillera Vilcabamba |
4 221.3 |
247 |
218 |
1000 |
|
Cordillera Vilcanota |
6 179.7 |
246 |
242 |
1000 |
|
Total |
42 135 |
2719 |
2511 |
9000 |
|
Table 3.
Rescaled Indicator Variogram Parameters for 1000 Pixels Across Cordillera Regions.
Table 3.
Rescaled Indicator Variogram Parameters for 1000 Pixels Across Cordillera Regions.
Cordillera |
Model1
|
Range (m) |
C02
|
C3
|
kappa |
Cordillera Blanca |
Mat |
5428.204 |
3.57E-04 |
0.0355 |
0.5 |
Cordillera Central |
Exp |
371.4613 |
0 |
0.00475 |
- |
Cordillera Huallanca |
Mat |
874.5616 |
1.14E-03 |
0.0184 |
1 |
Cordillera Huayhuash |
Mat |
3160.411 |
2.54E-03 |
0.1477 |
0.3 |
Cordillera Huaytapallana |
Mat |
1320.108 |
2.21E-03 |
0.004013 |
10 |
Cordillera Raura |
Mat |
1999.836 |
1.68E-02 |
0.07002 |
0.6 |
Cordillera Urubamba |
Mat |
684.1284 |
0 |
0.00736 |
10 |
Cordillera Vilcabamba |
Mat |
5326.374 |
1.30E-03 |
0.0264 |
0.4 |
Cordillera Vilcanota |
Mat |
3288.231 |
3.19E-04 |
0.03816 |
1.2 |
Table 4.
Selected hyperparameter data types and chosen values for each algorithm. Notations of hyperparameters from the respective R packages were used. p is the number of features.
Table 4.
Selected hyperparameter data types and chosen values for each algorithm. Notations of hyperparameters from the respective R packages were used. p is the number of features.
Algorithm |
Reference |
Hyperparameter |
Type |
Default |
Gradient Boosting Machines (GBM)1
|
[76] |
n.trees |
Integer |
100 |
n.minobsinnode |
Integer |
10 |
shrinkage |
Numeric |
0.1 |
distribution |
Nominal |
bernoulli |
Random Forest (RF) |
[69] |
num.trees |
Integer |
500 |
mtry |
Integer |
Sqrt(p) |
min.node.size |
Integer |
1 |
max.depth |
Integer |
0 |
Weighted k-Nearest Neighbors (KKN) |
https://github.com/KlausVigo/kknn |
k |
Integer |
10 |
distance |
Integer |
2 |
kernel |
Nominal |
gaussian |
Logistic Regression (LR) |
|
family |
Nominal |
binomial |
Table 5.
Mean Matthew Correlation Coefficients grouped by Cordillera and CV method.
Table 5.
Mean Matthew Correlation Coefficients grouped by Cordillera and CV method.
Glacier region |
SP-CV1 MCC |
NSP-CV2 MCC |
|
Cordillera Blanca |
0.928(0.0845)3
|
0.949(0.1054) |
|
Cordillera Central |
0.937(0.446) |
0.954(0.4624) |
|
Cordillera Huallanca |
0.877(0.1201) |
0.937 (0.1800) |
|
Cordillera Huayhuash |
0.753(0.0196) |
0.830(0.0576) |
|
Cordillera Huaytapallana |
0.904(0.1979) |
0.968(0.2617) |
|
Cordillera Raura |
0.915(0.0532) |
0.931(0.0699) |
|
Cordillera Urubamba |
0.906(0.3082) |
0.930(0.3317) |
|
Cordillera Vilcabamba |
0.891(0.2067) |
0.917(0.2326) |
|
Cordillera Vilcanota |
0.906(0.0618) |
0.929(0.0847) |
|
Table 6.
Modified paired t-test, t-statistics and the p-values (in parenthesis) of paired t-test comparison for the MCC obtained after the 50-repeteated 5-fold CV using the two types of cross-validation for each glacier region.
Table 6.
Modified paired t-test, t-statistics and the p-values (in parenthesis) of paired t-test comparison for the MCC obtained after the 50-repeteated 5-fold CV using the two types of cross-validation for each glacier region.
Glacier region |
LR |
RF |
GBM |
KNN |
Cordillera Blanca |
0.05971(0.4763) |
1.155 (0.1267) |
1.678 (0.04711) 1
|
2.061 (0.02229) 1
|
Cordillera Central |
0.2374(0.4066) |
0.8284 (0.2057) |
0.6774 (0.2506) |
1.391 (0.08516) |
Cordillera Huallanca |
1.202 (0.1174) |
1.9135 (0.03076) 1
|
1.4590 (0.07545) |
1.366 (0.08895) |
Cordillera Huayhuash |
1.44537 (0.07735) |
1.6833 (0.04933) 1
|
1.64397 (0.05329) |
1.64590 (0.0530) |
Cordillera Huaytapallana |
1.3271 (0.09530) |
1.5579 (0.06284) |
1.910303 (0.03092) 1
|
253.484 - |
Cordillera Raura |
0.76974 (0.2225) |
0.34939 (0.3641) |
0.4885 (0.3136) |
0.3651 (0.3582) |
Cordillera Urubamba |
1.59805 (0.04823) 1
|
0.93018 (0.1784) |
1.2942 (0.1008) |
0.0655 (0.4739) |
Cordillera Vilcabamba |
0.20655 (0.4186) |
1.61089 (0.04681) 1
|
0.9120 (0.1831) |
1.3423 (0.09283) |
Cordillera Vilcanota |
0.84632 (0.2007) |
0.62994 (0.2658) |
0.7284 (0.2348) |
0.48702 (0.3142) |