3.1 Snow Cover Fraction (SCF) Algorithm
The input data used for snow cover retrieval consists of Sentinel-2 (S2) images. The constellation is comprised of two twin satellites, namely S2-A and S2-B, which together provide a revisit frequency of 5 days in the area of interest. The study area is covered by five tiles: 32TNT, 32TPT, 32TPS, 32TQT, and 32TQS. The S2 Level 1C spectral bands are downloaded from CREODIAS (
https://creodias.eu/) and appropriately preprocessed in preparation for the core Snow Cover Fraction (SCF) algorithm. Three main steps are applied: i) conversion from digital number (DN) to Top of the Atmosphere (ToA) reflectance values using the quantification value provided in the Sentinel-2 metadata, ii) resampling of all bands to a spatial resolution of 20 m using cubic interpolation and cropping to the area of interest. Here, the raw S2 grid and coordinate reference systems are preserved to minimize changes as much as possible; iii) cloud masking using the S2 cloudless algorithm, which is available at
https://github.com/sentinel-hub/sentinel2-cloud-detector [
13]. Scenes with a cloud coverage greater than 50% are excluded from further processing.
The SCF algorithm retrieval is based on a Support Vector Machine (SVM) classification that is manually trained using an Active Learning (AL) procedure [
14].
All spectral bands are utilized as features, along with informative indices such as the Normalized Difference Vegetation Index (NDVI), which is calculated as the normalized difference between the near-infrared (NIR) and red bands, as well as the Normalized Difference Snow Index (NDSI), which is calculated as difference between the green and the shortwave infrared (SWIR). The SVM model utilizes a radial basis function kernel. To determine the model parameters, we employe a grid search strategy to identify the regularization parameter C and the kernel coefficient gamma. The grid is initialized with a user-defined range. The model selection process begins with a coarse grid and based on the obtained results, is refined around the values of C and gamma that performed the best. The best values are selected by evaluating the mean and standard deviation of the accuracy calculated in a cross-validation strategy with k-fold validation (k = 5).
The SVM model is trained exclusively with pure pixels under different illumination conditions i.e., diffuse light, direct light, and shadow. This is accomplished through visual inspection of the spectral signatures of the collected training samples, as well as considering the characteristics of adjacent pixels such as topographical features and vegetation in the surrounding area. The selected training data should represent distinct classes of "snow" and "snow-free" accurately, as they play a crucial role in defining the hyperplane, which is the decision function that separates the two classes. In detail, the trainings that define the hyperplane are the so-called support vectors. Once the hyperplane is defined, we can establish a correlation between the SCF and the distance to the hyperplane. This linear relationship is observed by considering an external SCF reference, specifically a WorldView (WV) image.
The AL procedurevre helps accelerate the learning process of the classification by involving the user in collecting training samples iteratively. The training selection was performed ad hoc for each scene and iteratively by the user, visually assessing the results for each scenario. Consequently, different trainings might be selected for different scenes, as well as different model parameters. This approach ensures a scene-specific classification tailored to the final purpose of achieving the highest possible accuracy for the intended product assimilation.
In previous studies [
15,
16], a similar approach was applied. However, previous studies focused on simple classification, while here we estimate SCF (Snow Cover Fraction). Additionally, an improved version of this algorithm is presented by [
17], that is based on the automatic selection of training data. This represents a big advantage when considering large areas and long time-series of data, while it is preferable to collect ad hoc trainings for achieving better performances.
3.2 Soil Moisture Content (SMC) Algorithm
The primary input data for the Soil Moisture Content Estimation consists of Sentinel-1 (S1) images. These satellite data are Synthetic Aperture Radar (SAR) images, which take advantage of providing, after appropriate preprocessing operations, a 2-D backscattered image of the terrain independent from weather conditions and daylight. The joint use of the two satellites S1-A and S1-B made it possible to obtain a repeat cycle of 6 days over the area of interest with SAR Interferometric Wide (IW) swath mode. The dual-polarization (VV + VH) Ground Range Detected products have been used as SAR input data for the soil moisture estimation. However, Copernicus Sentinel-1B encountered an anomaly related to the power supply of the instrument's electronics on 23 December 2021. Since then, the satellite has no longer been able to provide radar data. As a result of this failure, the only available satellite has become S1-A, and consequently, the repeat cycle has increased to 12 days.
Despite that repeat cycle, many orbits partially cover the area of interest of Tyrol and South Tyrol in Ascending or Descending mode, increasing the temporal resolution of the SMC time series using the six relative orbits: 66, 168, 95, 15, 117 and 44 (
Figure 2). The coverage for each CRNS used within the project is shown in
Table 3.
The algorithm used to estimate the SMC is based on a Machine Learning (ML) approach described in [
18], together with the input data utilized and features selected and validated globally. The source code is available online at
https://zenodo.org/records/4552813 and the related documentation is available at:
https://pysmm.readthedocs.io/en/latest/. The algorithm uses a Gradient Boosted Regression Trees (GBRT). It exploits the server-side processing capabilities of Google Earth Engine (GEE) eliminating the requirements to download or preprocess the input datasets. All the input datasets are available on GEE. They are used as training data to extract features and for masking pixels where the algorithm cannot estimate the SMC (i.e. pixel with snow cover presence, with high vegetation coverage, or in layover shadow).
In particular, the data selected from the 461 automatic stations of the International Soil Moisture Network (
https://ismn.geo.tuwie n.ac.at) provide the basis for the large-scale validation of satellite-derived soil moisture products.
The other data collected on GEE are Landsat-8 (shortwave reflectance & thermal radiance) [
19], MODIS MOD13Q1 Enhanced Vegetation Index (EVI) [
20], soil temperature and snow-water-equivalent from Global Land Data Assimilation System (GLDAS) [
21], Copernicus Global Land Cover Layer [
22], soil information from OpenLandMap (OLM) [
23] and SRTM Aster DEM [
24].
The original algorithm was implemented to obtain a spatial resolution of 50 m, but in this work the product was resampled at 20 m using a bilinear interpolation (
Figure 3). In this way, the spatial resolution is consistent with that of the SCF product. The GBRT algorithm used is a family of tree-based methods. This characteristic entails that it is compatible with different data and scales. Moreover, it has a relatively low computational cost associated with algorithm training and target prediction.
The classes included in the estimation from the Copernicus Global Land Layer are the following: bare/sparse vegetation, cropland, herbaceous vegetation, open forest and shrubs. Moreover, a masking based on several thresholds is implemented. In more detail, stations from ISMN must be available on the same day of the satellite acquisition and considering the limited penetration of C band SAR data, only stations that measure SMC at no more than 5 centimeters are included.
In general, also pixels in layover shadow or foreshortening for S1 images are filtered out, characterized by radiometric saturation, terrain occlusion or clouds in the L8 pixel quality band, with MODIS EVI greater than 0.5 and, regarding GLDAS estimates, pixels below a threshold temperature of 275 K and only SWE values of 0 kg/m2 are applied. The algorithm assigns a null value to pixels for which an SMC estimate could not be performed due to masking or because the corresponding satellite acquisition did not cover them.
The SMC dataset made available covers 1 October 2019 to 30 September 2022. This way, three spring-summer seasons are available, like what was delivered for SCF. The algorithm also estimates the SMC in the winter season, although in this case, considering that CRNS stations are located at high altitudes the thresholds set by GLDAS severely limit the number of points at which an estimate is available and, sometimes no estimate is available. In the latter case, the maps were discarded.
The dataset covers an area of 500x500 pixels centered on the 5 CRNS stations to be consistent spatially with the SCF dataset coming from S2 data.
3.3. Validation SCF
A very high-resolution dataset is exploited to evaluate the quality of the product developed since the lack of ground data does not allow for a systematic validation analysis by considering the different topographical and morphological conditions of the test area, which is predominantly mountainous.
To this purpose, we obtained some WorldView images, that we first prepared through a process of orthorectification and co-registration to make them comparable with S2-derived maps. The collected WorldView images used for verifying the reliability of the developed product are not always correspondent to the areas covered by the maps here presented (see Livigno and Sonthofen in
Table 4). Indeed, to obtain a dataset that covers as many scenarios as possible in terms of different topographies as well as different periods of the year or snow coverage, in some case we also exploited images close to the areas of the maps, but not perfectly coincident with them.
Furthermore, a cross-comparison with the Copernicus product is performed to evaluate the developed product with respect to the standard product especially in complex situations as, for example, in case of shadow.
3.3.1. Very High-Resolution Data Preparation
Six WorldView (WV) images are acquired following the criteria of selecting cloud-free scenes, acquisitions possibly coincident or very near to a S2 acquisition and snow cover conditions that might be interesting to test, as variable SCF over the scene (mixed pixels due to vegetation or melting process). The WV images used are both WV2 and WV3.
First, an orthorectification is needed. Very high-resolution (VHR) images often contain geometric distortions due to sensor orientation, topographic relief, and Earth's curvature. The orthorectification procedure aims to correct these distortions and transform the image into a georeferenced and geometrically accurate representation. These data are provided with an orthorectification kit consisting of the Rational Polynomial Coefficients (RPC). These are the coefficients of a fractional polynomial function, that link the image coordinates with the object space.
After being orthorectified, the image is co-registered and aligned to the S2 grid and reference system, by keeping a final resolution of 1 m.
Once the WV and S2 grid are made comparable, the VW needs to be classified. Analogously to what done for the S2 images, we used also for this case a SVM model trained separately for each VW image by manually collecting the training samples. The procedure is very similar to what described in the previous section, with the main difference that our target is here a binary classification.
Finally, we performed an aggregation of the VW images to 20 m spatial resolution thus obtaining the SCF to be compared with the S2 images.
3.3.2. Cross-Comparison with WorldView
The comparison with WV data is done by analyzing different test sites through different metrics to understand the goodness of the SCF estimation in different areas and variables conditions such as different seasons, topography and snow coverage conditions.
Figure 4 shows an example of comparison between the developed product and the reference VHR imaged classified: on the left-side the S2 map while on the right-side WV map.
The following table (
Table 4) reports the metrics calculated for the different areas, together with the dates of WV and S2 images, respectively, where the comparison with very high-resolution images has been performed.
From the
Table 4 we can see that on average the correlation is for all test sites, around 0.9, except for Livigno, where it is 0.5. The reason behind this drop in performance could be related to the different acquisition dates in the two images, S2 and WV. Indeed, as you can see in
Figure 5, before the WV acquisition there was a snowfall event, which, given the period (September 2021), was only an ephemeral snowfall lasting a few days and indeed it is not visible in the image acquired just two days later by S2. On the left side of the
Figure 5 the RGB false color (FC) of SCF maps by S2 (up) and WV (down) are shown; on the right side the classification results used for the comparison and the metric estimation.
3.3.3 Cross-Comparison with Copernicus
We performed the comparison for the five test sites in the period ranging from 1 October 2020 to 10 May 2023.
Table 5 lists the number of the analyzed scenes for each test site, the period and the extent over which the comparison took place.
Starting from the available scenes we computed bias, unbiased RMSE, RMSE and the cross correlation to compare the two products. The results are shown in
Table 6.
3.4 Validation SMC
The SMC algorithm was already validated by [
18] at a global scale. However, in this work, a few alpine automatic monitoring stations measuring soil water content (SWC) in Val Mazia (South Tyrol – Italy) were considered (
Figure 6) for comparisons with SMC maps estimated by the ML algorithm [
26]. These stations record meteorological and biophysical variables of the long-term socio-ecological research LT(S)ER site Matschertal / Val di Mazia (
https://browser.lter.eurac.edu/). The data can be downloaded from this website after registration. This approach was considered because tmost of the ISMN network used to train the SMC algorithm is not located in alpine or, in general, in mountainous areas like those considered in the project. Consequently, this comparison can be helpful to better understand the behavior of the algorithm in this specific environment. Among many parameters like air temperature, relative humidity, precipitation and wind speed, etc. these climate stations record the SWC at different depths from 2 to 50 cm, averaging 15 samples taken every minute, aggregating them into a single value. Considering the penetration characteristic of the SAR signal, only the measurements at 2 and 5 cm were considered, while the ones at 20, 40 and 50 cm were excluded. For the comparison, the satellite acquisition time was considered, and the ground measurements from one hour before to one hour after that time were further averaged and utilized.
The stations are equipped with Campbell Scientific TDR (Time Domain Reflectometry) sensors, able to measure soil temperature and SWC (range 0 to 52%, accuracy ±3% for Electrical Conductivity ≤ 10 dS/m).
The monitoring stations are located in the western part of South Tyrol (Val Mazia catchment; an inner-alpine dry valley in the Italian Alps) and are shown in
Figure 6. The four stations were selected among a network of 24 climate stations considering their peculiar characteristics, especially in terms of elevation, exposure, slope, and landcover type, as observed in the photos in the
Table 7 . By referring to the period May – October of the three observation years 2020 -2022, different metrics were considered based on what is described in [
27,
28,
29]. Mean Absolute Error, Bias and Unbiased Root Mean Square Error are reported in
Table 8. The selection of the study period considers that these months have the lowest probability of snow cover, making it easier to obtain a consistent and continuous series of SMC maps, which is helpful for the time series analysis.
Figure 8,
Figure 9,
Figure 10 and
Figure 11 show examples of the trends of the SMC estimated by the algorithm compared to that extracted from the automatic meteo stations at 2 and 5 centimeters. The graphs for different ground stations and years are related to May –October, when the stations are usually not affected by snow, to highlight the temporal trends. The measurements and estimations are acquired on the same day as described above. In 2022, the number of SMC estimates from remote sensing data is reduced as only the S1-A satellite was available then.
The three curves show similarities in the trends although the estimated values present a more compressed dynamic. A possible reason can be related to the temporal filter applied to SAR data to reduce the speckle of the SAR images. The measurements at 2 and 5 centimeters are very similar except for B3 and M2 stations, which are a meadow and a pasture station, respectively, but are characterized by a flat terrain. In this B3 case, where the dynamics are broader, the estimated value is closer to the ground measurement at 2 centimeters, while in M2 case the estimation over the entire analysis period is between the two ground estimations.
3.4.1 Comparison with CRNS and Point-Scale SMC Data
A further comparison with CRNS and point-scale SMC data illustrates a potential use case for the data at the site of Leutasch. The Leutasch site is part of the COSMOS Europe network [
30]. The CRNS based SMC values were obtained using the calibration procedure described in [
31]. The data needs to be corrected for changes in incoming cosmic radiation, atmospheric pressure and air humidity. The residual signal is highly sensitive to changes in hydrogen content in soil, vegetation and snow [
32]. CRNS represent a large footprint of several hectares and a depth of several decimeters [
33]. The signal is thus more representative for the hydrological state of the location than point-scale measurements. Furthermore, a depth-profile of point-scale SMC measurements using SMT100 sensors is available.
The trends of the comparisons obtained are shown in
Figure 12 and
Figure 13. Therein, it is visible that CRNS and remote sensing-based SMC estimates show similar temporal soil moisture dynamics. However, due to the larger measurement volume, in particular with regard to the depth of the signal, the CRNS data show different absolute values. The point-scale data of the upper-most SMT100 probe in 5 cm depth, in contrast, gives very similar levels of SMC. Both SMT100 and CRNS sensors have higher dynamic range than the remote sensing-based data. Thus, due to different integration volumes and spatial coverages the combination remote sensing and in-situ based SMC data can give further insights into the hydrologically relevant dynamics. This is particularly true if combining spatially representative CRNS data with regionally available remote sensing products.