1. Introduction
Many functions are carried out by soil, ensuring a fundamental contribution to human life and well-being, however, to understand the potential of soil in providing functions, it is necessary to characterize the variability of soil properties in space and time [
1]. Soil texture is one of the most important properties that influence many soil functions and processes through both the absolute values of its properties and their spatial variability [
2]. In particular, it controls variations in soil moisture, availability of water to crops, erosion and aggregation. In addition to direct measurements in the laboratory, different types of soil sensing technologies, from laboratory reflectance diffuse spectroscopy to proximal and remote sensing, can provide rapid and reliable information on soil properties variability at different spatial and temporal scales [
3,
4,
5]. However, data measured in laboratory and/or with different soil sensing methods require their fusion and integration [
6,
7,
8]. A proper integrated use or synthesis (fusion) of different spatial data results to be more informative than the one coming from individual sources and provides new knowledge, understanding, or explanations of the processes [
9,
10,
11].
When mapping and modeling topsoil there has always been a compromise between the need to produce models of high spatial resolution but limited extent (at local scale) and the need to produce models of coarse spatial resolution but wide extent (as at regional or country scale) [
12]. If extended spatial models could also be produced at high resolution, this would undoubtedly mean major advances in understanding the physical and biological processes that take place on the Earth's surface [
13]. Morphometric attributes as elevation, slope gradient, aspect, local relief, slope curvature, have been commonly used to estimate the surface lithology and model the spatial variability of soils [
14,
15]. However, most of these studies used digital elevation models with a spatial resolution greater than 10 m. Although such a scale may be suitable for some environmental studies, it may be unsuitable for accurately describing the spatial variability of highly changing landscapes [
16], such as the characteristic landscapes of the forested mountain of some Calabrian areas in southern Italy [
17]. The climate, the topography, the vegetation cover, and land use are the main factors that influence soil properties and determine its heterogeneity at many scales [
18].
One of the most important recent developments in remote sensing for lithological mapping of the soil surface is the increasing availability of high-resolution digital elevation models (DEM)s derived from LiDAR (Light Detection and Ranging) elevation data. The high density of measurement points obtained from a LiDAR sensor allows for precision mapping of land surface features and also makes it possible to clearly distinguish soil from vegetation [
12]. LiDAR data are widely used to study landscape [
19], topography [
20], tree attributes ([
21,
22] phenotyping [
23], and biomass estimation [
17] but can be used to assess soil properties [
24]. The use of LiDAR also enables to generate large-scale DEMs with sub-meter resolution in extensive heterogeneous mountain forested areas [
25]. Recently, the DEM resolution derived by LiDAR data has increased reaching even 0.5 m with the use of airplanes [
26] up to very accurate resolutions (~0.03 m) using the drone [
27]. Airborne Laser Scanning (ALS) is the most highly accurate and efficient method to acquire 3D data from large areas and to generate DEMs, such as Digital Terrain Model (DTM) and Digital Surface model (DSM). Moreover, using geographic information systems (GIS) algorithms, these new high-resolution elevation data can be used to provide several primary and secondary topographic attributes.
It is widely accepted that a larger number of soil samples can more accurately describe and model spatial heterogeneity, however the limited budget for soil measurements generally restricts the number of samples when using standard laboratory techniques as they are generally expensive and time consuming [
28,
29]. As an alternative soil spectroscopy techniques, such as visible-near-infrared spectroscopy (Vis-NIR, 350-2500 nm), can be used to measure soil properties such as the different particle sizes (clay, silt and sand), organic and inorganic carbon content and iron content quickly, accurately and relatively inexpensively [
30,
31]. This allows the use of soil data from hyperspectral spectroscopic measurements, appropriately reduced in dimensions, as an additional source of information supplementing direct laboratory soil measurements, in order to improve the prediction of the variable of interest.
To produce soil maps associated with some measure of prediction uncertainty by efficiently combining (fusing) heterogeneous data, several techniques spanning from traditional spatial statistics, geostatistics to machine learning have been used in recent decades [
32,
33,
34,
35]. Actually, the different computing technologies lead to a single formulation of spatial modelling expressed by the following formula [
36]:
where:
x: location vector in one, two or three dimensions,
Z(x): the random variable Z at location x,
μ(x): deterministic component or trend (drift),
ɛ’ (x): stochastic component, i.e. spatially dependent residual from μ(x)
ɛ’’: non spatially correlated component (white noise or unexplained variability).
Such a formula makes it possible to model the spatial variability of any soil property on different spatial scales, separating the deterministic component (at long scale) from the stochastic one (at short or local scale). Non-geostatistical techniques (multi-linear regression or machine learning) have generally been used to describe the trend, while geostatistical techniques to quantify the stochastic component of variability. These two types of approaches can be kept separate or combined in hybrid techniques (non-stationary geostatistics) [
37]. One of the latter is kriging with external drift [
38,
39], when the trend is externally modelled by auxiliary variables and trend and residuals are simultaneously estimated in a single system with a jointly calculated covariance function. However, the trend of the target variable or some of its transformation functions must be expressed by linear models.
The objectives of the study were: 1) to evaluate the ability of a multivariate approach based on non-stationary geostatistics to merge remotely sensed high-resolution LiDAR-derived topographic attributes with Vis-NIR diffuse reflectance spectroscopy and laboratory data to produce high-resolution maps of soil sand content and its estimation uncertainty in a forested catchment in southern Italy, and 2) compare this approach with the commonly used univariate approach of ordinary kriging.
3. Results and discussion
The basic statistics of the soil attributes and elevation measured at 135 points are reported in
Table 1. As can be seen, sand, silt and clay exhibited sufficiently symmetrical distributions, while SOC showed a great positive asymmetry. Therefore, it was decided to transform SOC through a Gaussian anamorphosis process before interpolating it with ordinary kriging.
Regarding the diffuse reflectance spectra measurements, in
Table 2 are reported the results of PCA. Only the principal components (eigenvectors) with an eigenvalue greater than 1 are reported in
Table 2. It is to note that the first PC explains more than 85%, due to the high correlation among the reflectance at the different bands, while the second explains only less than 10% of the total variance. Since each of the remaining four PCs explains just or less than 2% of the total variance, it was decided, in order to simplify the subsequent analyses and facilitate the interpretation of the radiometric indices, to retain only the first two PCs which cumulatively explain almost 95%.
The composition of the first two rotated PCs can be derived from
Figure 2.
Practically, all bands (410-1760 nm) from the visible (Vis, 380-760 nm) to the near infrared (NIR, 760-1500 nm) and part of the short-wave infrared (SWIR, 1500-3000 nm), except for a narrow range centered on about the 1400 nm band, weighed positively and significantly on PC1 (
Figure 2). This can then be interpreted as an indicator of the average diffuse reflectance of the soil in the indicated radiometric range and of its color. Furthermore, remembering that absorbance is inversely proportional to reflectance, the PC1 can also be assumed to be an inverse indicator of iron oxide content [
65].
On the PC2, mainly the bands from 1760 to 2500 nm weigh positively and significantly (
Figure 2), comprising the major absorption bands for OC (primarily 1772 nm; 1871 nm; and to a lesser extent 660 nm; 2070 nm; 2177 nm; 2246 nm; 2351 nm; 2483 nm) and for clay (mainly 2201 nm and to a lesser extent 1877 nm; 1904 nm; 2177 nm; 2192 nm; 2220 nm; 2492 nm) [
65]. The PC2 can therefore be interpreted as an inverse indicator of SOC and clay contents.
3.1. Raster data
Table 3 shows the basic statistics of elevation and topographic attributes obtained from LiDAR data and calculated at the nodes of a one-meter mesh grid. As can be seen, the study area with a maximum height difference of about 300 m is characterized by a high variability in its topography and consequently a high local relief. There are some very steep areas with slopes of about 72 degrees, which are practically inaccessible, where it was impossible to take soil samples. LS and SPI also show very high outliers and there are areas with negative curvature which indicate a surface is concave, and others with positive curvature (convex surface) with a prevalence of those with negative curvature (negative skewness).
The high variability of the topographic attributes can be explained by the changing morphology of the catchment under study (
Figure 1) with an increasing elevation gradient from west to east and southeast. Furthermore, the area is characterized by a dense drainage network with numerous tributaries of the main stream, which are affect by of deep and narrow incisions.
In order to apply the KED to the sand content data, all auxiliary variables had to have been calculated or estimated at the nodes of the 1 m DEM. Therefore clay, SOC, PC1 and PC2 kriging estimates were added to the topographic attributes.
An isotropic model consisting of a nugget effect and an exponential model with a range of 394.5 m and therefore an effective range of about 1185 m (approximately the maximum size of the basin in direction 132° from the north) was fitted to the experimental variogram of the clay content (
Figure 3).
This denotes a great continuity in the variation of clay content as is evident from its kriging map (
Figure 4a) in which the highest contents occurred in the lowest parts whereas the lowest ones in the highest parts in the East and Southeast.
Different is the behavior shown by SOC, which was previously transformed to a Gaussian variable as mentioned before. An isotropic model including a nugget effect and a spherical model with a range of 137 m was fitted to the experimental variogram of its Gaussian transform (
Figure 5).
The variability of SOC is thus characterized by less continuity than that of clay with spatial autocorrelation over a shorter range (137 m). The implication of the above is also evident in the back transformed map of SOC's kriging estimates (
Figure 4b). It is characterized by a pattern of numerous small areas of limited size. However, two macro-areas can still be distinguished: one above the median line (132° N) of the basin with higher values and the part south of this line with lower values.
An isotropic LMC consisting of three basic structures: a nugget effect, a spherical model with a short-range of 159 m and a spherical model with a long-range of 716 m was fitted to the set of direct and cross experimental variograms of PC1 and PC2 (
Figure 6).
As can be deduced from
Figure 6, both in PC1 and PC2, the largest spatial component is unfortunately represented by the discontinuity at the origin of the graphs (intercept). That is called nugget effect and results from insufficient density of sampling with large unsampled areas. It is also shown that in the cross-variogram, the model reaches zero within a distance of about 150 m (very low sill). This means that the extraction and rotation of the PCs produced not only statistical but also quite spatial independence of the two components at a relatively short distance. This is also verified by the high distance from the intrinsic correlation curve (dashed line), which represents the maximum possible spatial correlation between the two variables (Wackernagel, 2003).
Figure 7a shows the map of PC1, which has a somewhat reverse pattern to that of SOC. This is consistent with the previously given interpretation of PC1 as a sort of spatial indicator of average soil reflectivity. It is known in fact that soils with a higher organic matter content are generally darker and therefore less reflective [
65].
Figure 7b shows the map of PC2 with a large zone of higher values in the north. Interpretation becomes more difficult here, as the area at lower elevation (
Figure 1) includes both zones with higher clay contents (north-west corner) (less reflective) and zones (south and southeast) with low clay and organic matter contents (more reflective). Actually, the diffuse reflection by soil in the range 1700-2500 nm depends on the presence of other components beyond the texture and organic carbon. Furthermore, selective absorption also depends on the particular composition of the clay and organic matter [
66].
3.2. Coregionalization data set
The correlation matrix between sand content and the variables assumed as potential auxiliary variables is reported in
Table 4. As can be seen, apart from the expected high negative correlation with clay, and the smaller positive correlation with elevation, the other correlations are rather low. It was therefore decided to exclude SPI and curvature as possible auxiliary variables, nevertheless to retain SOC and TWI to account for possible local influences of organic carbon and water content on estimation of sand content, because, generally, the different contents of sand are affect the soil moisture and the rate of decomposition and stabilization of the organic carbon [
67]. On the other hand, spatial correlations might also be nonlinear and therefore not evaluated by Pearson's correlation coefficient.
3.3. Kriging with external drift
3.3.1. Trend estimation
For the trend calculation, various combinations (T=17) of internal (a linear function of x, y coordinates) and external drift (different linear combinations of the ten auxiliary variables selected: 1. clay; 2. SOC; 3. PC1; 4. PC2; 5. elevation; 6. slope; 7. aspect; 8. LS; 9. TRI; 10. TWI) were compared. For the internal drift, it was preferred to consider only a linear function of the coordinates (deterministic type variation), treating most of the variation in sand content due to spatial coordinates as stochastic and then described by the generalized covariance function. As can be seen from the examination of
Table 5a in which the various trend models are sorted according to mean rank, the best model with the lowest mean rank and also the lowest mean square error was the one consisting of a constant coefficient plus clay content as auxiliary variable.
The automatic structure identification is made in two steps (
Table 5a and b). In
Table 5a, each line indicates the performance of an order k. The results based on polynomials fitted are reported under the heading Ring 1 and Ring 2. They correspond to the separation of data in two subset (ring 1 and 2) to improve the performance of jack-knife procedure: data in the ring 1 are used for the estimation of points of rings 2, and vice versa [
39]. The second steps (
Table 5b) is the identification of the generalized covariance, which is the equivalent of the variogram in the non-stationary case.
In the results, it is surprising is that the trend model with the inclusion of the first topographical attribute (elevation), although significantly correlated with sand content (
Table 4) occupies the 5th position with lower mean absolute error but significantly higher mean square error and mean rank. The inclusion of the other topographic attributes worsened even further the performance of the model.
The interpretation of these results might be that the particle size composition of the soil in this catchment is essentially related to other intrinsic properties, while is little sensitive to relief features.
3.3.2. Generalized Covariance Function (GCf) identification
With regard to the estimation of the generalized covariance function, the model that realized the jackknife estimator closest to 1 is the one consisting of the nugget effect plus the linear component with scale equal to 200 m (
Table 5b). The spatially uncorrelated component is approximately one order of magnitude larger than the spatially correlated one.
Synthesizing the two previous results on trend estimation and identification of GCf, it can therefore be said that according to the KED approach, all intrinsic spatial variation is stochastic with a prevalence of the spatially uncorrelated component (nugget effect).
In consideration of the fact that measurements of texture and in general of all soil properties are demanding of both man-hours and costs, it was decided to assess how much the sand model degraded by considering only topographic attributes as auxiliary variables. These are generally more available (less expensive) and at a much higher sampling density than observations of soil properties.
Eight trend models were then compared (
Table 6), by subsequently adding all the other seven topographic attributes (TRI, aspect, slope, LS; curvature, SPI, TWI) as auxiliary variables to the elevation.
The best model in terms of both minimum mean rank and minimum mean square error is the one that uses elevation as the only auxiliary variable (
Table 6a). This is indeed the only topographic attribute that showed a significant correlation with sand content, while the correlations with the other attributes were rather low, probably due to the uncertainty of their estimates and the high morphological variability due to denudation processes (landslides and water erosion) that affected the site [
40,
68,
69]. These processes caused a reworking and a spatial redistribution of surficial soil.
As far as the generalized covariance function is concerned, it consisted again of two structures, nugget effect and linear component with a 200-m scale (
Table 6). However, there was a marked prevalence of the spatially uncorrelated component, which was well about 2 orders of magnitude higher than the structured component. The exclusion of clay as an auxiliary variable in the trend therefore also resulted in an increase of randomness regarding the stochastic component of the sand estimation variance.
Assuming ordinary kriging as the univariate reference approach, an isotropic model containing three basic structures was fitted to the experimental variogram of the sand content: a nugget effect, a spherical model with a range of 150.99 m and a spherical model with a range of 513 m (
Figure 8).
The variogram model appears to be well structured, upper bounded and with the spatial component at shorter range approximately twice as large as the uncorrelated spatial component and the structured component at longer range.
3.4. Comparison among the three approaches
The results of the cross-validation of the three models for estimating sand content are shown in
Table 7: Model1, whose trend contains only the clay content as auxiliary variable; Model2 in which the trend has only the topographic attribute of elevation, and Model 3 with no trend.
As expected, Model1 was the best in terms of observation-estimate correlation, lowest bias, best accuracy and lowest correlation between standardized errors and estimates. This result was already predicted by comparing the ten trend models with different combinations of auxiliary variables including both soil attributes and topographic attributes (
Table 5 and
Table 6). However,
Table 7 allows to assess the actual advantage obtained by supplementing the sand content observations with elevation, obtained from high spatial resolution LiDAR data, as the auxiliary variable. Model2 compared to Model3 has a higher observation-estimate correlation albeit small, a lower bias and a lower standardized error-estimate correlation. The reason for this rather modest result after the inclusion of elevation as an auxiliary variable is probably due to the particular contextual characteristics. However, this should not dissuade from continuing to investigate the real advantages of using high spatial resolution LiDAR data to improve the prediction of soil attributes. Soil measurements are generally expensive and the collection of soil samples at some locations may be difficult or even impossible as occurred in this case due to the extremely steep slope.
The previous results made it possible to compare the three models from a statistical point of view, essentially in terms of smoothing effect, bias, and accuracy. The purpose is now to continue the comparison in terms of mapping or graphical restitution of the sand content estimates and their corresponding uncertainties.
Maps of the sand content estimates for the three models are shown in
Figure 9.
It is firstly to be said that the three maps are coherent in reporting the main structures of spatial dependence. However, it is possible to highlight clear differences among the maps. The one for Model1 (clay as the auxiliary variable,
Figure 9a) shows a high degree of spatial continuity and, and mostly reproduces the inverse relationship with the clay due to the strong negative correlation between the two variables. The Model2 map (trend with elevation as the auxiliary variable,
Figure 9b), although quite similar to the previous one, shows greater short-range variability due to the finer spatial resolution of elevation compared to the sampling scale of clay. However, it is worth remembering that clay was interpolated with ordinary kriging at the DEM grid nodes to be able to apply KED. Finally, the Model3 map (no trend,
Figure 9c) shows less spatial continuity compared with the one of the one of Model1 and is characterized by more numerous micro-structures of limited extent, probably due to the random nature of the spatial variation of sand content. In
Figure 10 the uncertainty of the estimates, as expressed by the standard deviation of estimation, is compared for the three models.
Being OK and KED linear interpolators, the estimation standard deviation depends on the sampling scheme and the mathematical model of the variogram for Model3, whereas the mathematical model of the generalized covariance function for Model1 and 2. Consistently, the 3 models show the lowest values in the central part, where sampling is densest, and the highest values at the edges of the area. However, compared to Model3 (
Figure 10c), which faithfully reproduces the sampling pattern with minima at the sampling locations, Model1 (
Figure 10a) and 2 (
Figure 10b) reveal more spatial continuity with a greater accentuation of short-range variability for Model2.
What these results highlight is that the choice of auxiliary variables can have a significant impact on mapping of estimation and its uncertainty of the variable of primary interest. Even if the same macro spatial structures are reproduced by the various models, there can be appreciable local differences and only a careful validation process can help in choosing the optimal model.
4. Conclusions
The main objective of this work was to evaluate whether combining observations of soil properties with LiDAR-derived topographic attributes can be considered an efficient and straightforward approach for mapping soil texture (sand content) both at fine resolution and at large spatial scale. Through LIDAR, we have topographical knowledge of a vast area compared to manual measurements, and the results highlighted the importance of applying this tool for accurate soil measurements. Various linear trend models using different auxiliary variables were compared. Statistically, the best one used clay content as the auxiliary variable. Actually, the improvement obtained by using only LiDAR data (elevation) compared to the univariate (no trend) model was rather marginal.
Nevertheless, the terrain morphology detected by Lidar can affect the soil moisture and nutrients available and play an important role in forest dynamics, and growth rates.
However, it is widely recognized that there are various advantages, including economic ones, that would result from the use of LiDAR data in predicting soil lithology (Prince et al., 2020), and that LiDAR elevation data are now becoming the new standard for the production of high-resolution DEMs worldwide. What was said before should therefore lead to the development of new and novel methods. We believe that with the increased availability of data, an improvement in digital soil mapping (DSM) can be achieved by the application of Machine Learning (ML) techniques to the development of trend models, leaving geostatistics to assess and model the stochastic component of spatial variation. Anyway, ML models should be more interpretable and explainable [
70] and not just more accurate, so that important scientific insights can be extracted from soil data and such models can serve as an effective source of information and knowledge.
Figure 1.
Location of study area and soil sampling points.
Figure 1.
Location of study area and soil sampling points.
Figure 2.
Pattern of loadings multiplied by 100 of the first two principal components with values above 68.73. 68.73 is the root mean square of all the values multiplied by 100 in the matrix of PC loadings.
Figure 2.
Pattern of loadings multiplied by 100 of the first two principal components with values above 68.73. 68.73 is the root mean square of all the values multiplied by 100 in the matrix of PC loadings.
Figure 3.
Experimental variogram (filled circle) and the fitted model (solid red line) for clay content. Experimental variance (horizontal dashed line) is also showed.
Figure 3.
Experimental variogram (filled circle) and the fitted model (solid red line) for clay content. Experimental variance (horizontal dashed line) is also showed.
Figure 4.
Maps of clay (a) and soil organic carbon (b) content obtained with univariate ordinary kriging (white zones represent those with emergent rock).
Figure 4.
Maps of clay (a) and soil organic carbon (b) content obtained with univariate ordinary kriging (white zones represent those with emergent rock).
Figure 5.
Experimental variogram (filled circle) and the fitted model (solid red line) for Gaussian SOC content. Experimental variance (horizontal dashed line) is also showed. The symbol G before the variable name stands for Gaussian-transformed variable.
Figure 5.
Experimental variogram (filled circle) and the fitted model (solid red line) for Gaussian SOC content. Experimental variance (horizontal dashed line) is also showed. The symbol G before the variable name stands for Gaussian-transformed variable.
Figure 6.
Linear model of coregionalization (LMC) of PC1 and PC2. Black points are the experimental variograms; red solid lines are the fitted models, red dash-dotted lines in the cross-variogram are the hull of perfect correlation, black dashed line is the experimental variance.
Figure 6.
Linear model of coregionalization (LMC) of PC1 and PC2. Black points are the experimental variograms; red solid lines are the fitted models, red dash-dotted lines in the cross-variogram are the hull of perfect correlation, black dashed line is the experimental variance.
Figure 7.
Maps of the (a) first (PC1) and (b) second (PC2) principal components calculated from diffuse reflectance spectra.
Figure 7.
Maps of the (a) first (PC1) and (b) second (PC2) principal components calculated from diffuse reflectance spectra.
Figure 8.
Experimental variogram (filled circle) and the fitted model (solid red line) for clay concentration. Experimental variance (horizontal dashed line) is also showed.
Figure 8.
Experimental variogram (filled circle) and the fitted model (solid red line) for clay concentration. Experimental variance (horizontal dashed line) is also showed.
Figure 9.
Maps of the sand content estimates using the three models the three models: (a) model 1 in which the trend includes only the clay content as auxiliary variable, (b) model 2 in which the trend includes only elevation, and (c) model 3 with no trend.
Figure 9.
Maps of the sand content estimates using the three models the three models: (a) model 1 in which the trend includes only the clay content as auxiliary variable, (b) model 2 in which the trend includes only elevation, and (c) model 3 with no trend.
Figure 10.
Maps of the standard deviation of sand estimation using the three models the three models: (a) model 1 in which the trend includes only the clay content as auxiliary variable, (b) model 2 in which the trend includes only elevation, and (c) model 3 with no trend.
Figure 10.
Maps of the standard deviation of sand estimation using the three models the three models: (a) model 1 in which the trend includes only the clay content as auxiliary variable, (b) model 2 in which the trend includes only elevation, and (c) model 3 with no trend.
Table 1.
Summary statistics for the concentrations of sand, silt, clay, and soil organic carbon.
Table 1.
Summary statistics for the concentrations of sand, silt, clay, and soil organic carbon.
Statistics |
Sand (%) |
Silt (%) |
Clay (%) |
SOC (%) |
Minimum |
39.00 |
1.00 |
7.00 |
0.67 |
Median |
63.00 |
22.00 |
15.00 |
2.38 |
Mean |
62.63 |
21.34 |
16.03 |
2.66 |
Maximum |
86.00 |
40.00 |
29.00 |
11.02 |
Stand. Dev. |
9.73 |
6.72 |
5.04 |
1.30 |
Skewness |
-0.26 |
0.00 |
0.72 |
2.69 |
Kurtosis |
2.86 |
3.01 |
3.11 |
15.58 |
Table 2.
Decomposition of the correlation matrix of diffuse reflectance data in principal components (PC) (only PCs with eigenvalue >1 are reported) .
Table 2.
Decomposition of the correlation matrix of diffuse reflectance data in principal components (PC) (only PCs with eigenvalue >1 are reported) .
PC |
Eigenvalue |
Difference |
Explained variance (%) |
Cumulative explained variance (%) |
1 |
186.61 |
166.34 |
85.21 |
85.21 |
2 |
20.27 |
15.89 |
9.26 |
94.47 |
3 |
4.38 |
0.81 |
2.00 |
96.47 |
4 |
3.57 |
2.05 |
1.63 |
98.10 |
5 |
1.52 |
0.40 |
0.69 |
98.79 |
6 |
1.12 |
0.61 |
0.51 |
99.31 |
Table 3.
Summary statistics for the topographic attributes.
Table 3.
Summary statistics for the topographic attributes.
Statistics |
Elevation (m) |
Slope (°) |
Aspect (-) |
LS (-) |
SPI (-) |
TRI (-) |
TWI (-) |
Curvature (-) |
Minimum |
1020.53 |
0.00 |
-1.00 |
0.00 |
0.00 |
0.00 |
0.00 |
-84.37 |
Median |
1168.26 |
22.45 |
245.52 |
4.60 |
0.01 |
0.28 |
5.90 |
0.16 |
Mean |
1171.02 |
23.40 |
213.27 |
5.05 |
0.23 |
0.31 |
5.98 |
-0.02 |
Maximum |
1340.83 |
72.86 |
360.00 |
112.63 |
1131.30 |
7.23 |
24.03 |
89.76 |
Stand. Dev. |
65.79 |
11.44 |
104.17 |
3.36 |
5.74 |
0.19 |
1.65 |
4.47 |
Skewness |
0.18 |
0.45 |
-0.60 |
2.23 |
65.10 |
1.49 |
1.74 |
-1.61 |
Kurtosis |
2.27 |
2.90 |
2.04 |
24.17 |
6394.88 |
8.71 |
12.44 |
40.59 |
Table 4.
Correlation matrix of the coregionalization data set. In bold are the significant correlation coefficients with a probability level p<0.001.
Table 4.
Correlation matrix of the coregionalization data set. In bold are the significant correlation coefficients with a probability level p<0.001.
Variables |
Sand |
Clay |
SOC |
PC1 |
PC2 |
Elevation |
Slope |
Aspect |
LS |
SPI |
TRI |
TWI |
Curvature |
Sand |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
Clay |
-0.76 |
1 |
|
|
|
|
|
|
|
|
|
|
|
SOC |
-0.08 |
0.02 |
1 |
|
|
|
|
|
|
|
|
|
|
PC1 |
-0.05 |
0.07 |
-0.59 |
1 |
|
|
|
|
|
|
|
|
|
PC2 |
-0.19 |
0.28 |
-0.13 |
-0.04 |
1 |
|
|
|
|
|
|
|
|
Elevation |
0.42 |
-0.37 |
0.26 |
-0.36 |
-0.18 |
1 |
|
|
|
|
|
|
|
Slope |
-0.17 |
0.08 |
0.10 |
-0.19 |
0.03 |
-0.17 |
1 |
|
|
|
|
|
|
Aspect |
0.18 |
-0.10 |
-0.01 |
-0.06 |
-0.04 |
0.20 |
0.05 |
1 |
|
|
|
|
|
LS |
-0.15 |
0.05 |
0.00 |
-0.18 |
-0.03 |
-0.14 |
0.85 |
0.04 |
1 |
|
|
|
|
SPI |
-0.04 |
0.06 |
-0.06 |
0.07 |
-0.09 |
-0.06 |
-0.13 |
-0.16 |
0.04 |
1 |
|
|
|
TRI |
-0.19 |
0.09 |
0.10 |
-0.16 |
0.14 |
-0.23 |
0.89 |
0.01 |
0.75 |
-0.08 |
1 |
|
|
TWI |
-0.02 |
0.02 |
-0.21 |
0.11 |
-0.17 |
0.11 |
-0.48 |
-0.13 |
-0.08 |
0.60 |
-0.43 |
1 |
|
Curvature |
-0.08 |
0.21 |
0.08 |
-0.05 |
0.30 |
0.02 |
-0.06 |
0.07 |
-0.33 |
-0.25 |
-0.07 |
-0.37 |
1 |
Table 5.
Automatic structure identification. T stand for trial; x, y: coordinates. For identifying the external drift, different linear combinations (T1 to T17) of the selected ten auxiliary variables were used: 1. clay; 2. SOC; 3. PC1; 4. PC2; 5. elevation; 6. slope; 7. aspect; 8. LS; 9. TRI; 9. TWI.
Table 5.
Automatic structure identification. T stand for trial; x, y: coordinates. For identifying the external drift, different linear combinations (T1 to T17) of the selected ten auxiliary variables were used: 1. clay; 2. SOC; 3. PC1; 4. PC2; 5. elevation; 6. slope; 7. aspect; 8. LS; 9. TRI; 9. TWI.
(a) Identification of the order k |
|
|
|
|
|
|
|
|
|
|
|
|
Mean Error |
|
Mean Squared Error |
|
Mean Rank |
Trial |
Ring1 |
Ring2 |
Total |
|
Ring1 |
Ring2 |
Total |
|
Ring1 |
Ring2 |
Total |
T1: 1 f1 |
0.260 |
-0.645 |
-0.197 |
|
44.020 |
49.220 |
46.640 |
|
7.043 |
6.924 |
6.983 |
T7: 1 f1 f2 |
0.256 |
-0.654 |
-0.203 |
|
47.210 |
51.990 |
49.620 |
|
7.137 |
7.127 |
7.132 |
T9: 1 f1 f2 f3 |
0.529 |
-0.676 |
-0.079 |
|
55.440 |
58.050 |
56.760 |
|
7.933 |
7.574 |
7.752 |
T11: 1 f1 f2 f3 f4 |
0.576 |
-0.601 |
-0.018 |
|
61.680 |
63.080 |
62.380 |
|
8.309 |
8.052 |
8.179 |
T12: 1 f1 f2 f3 f4 f5 |
0.382 |
-0.632 |
-0.129 |
|
64.530 |
69.020 |
66.790 |
|
8.359 |
8.054 |
8.205 |
T2: 1 x y f1 |
0.600 |
-1.021 |
-0.217 |
|
48.780 |
65.710 |
57.320 |
|
7.301 |
8.265 |
7.787 |
T8: 1 x y f1 f2 |
0.745 |
-0.934 |
-0.102 |
|
52.060 |
67.750 |
59.970 |
|
7.568 |
8.392 |
7.983 |
T10: 1 x y f1 f2 f3 |
0.770 |
-0.889 |
-0.067 |
|
64.920 |
74.870 |
69.940 |
|
8.418 |
8.571 |
8.495 |
T13: 1 f1 f2 f3 f4 f5 f6 |
0.734 |
-1.036 |
-0.158 |
|
77.280 |
80.660 |
78.990 |
|
8.967 |
8.754 |
8.860 |
T14: 1 f1 f2 f3 f4 f5 f6 f7 |
0.884 |
-0.888 |
-0.009 |
|
86.140 |
96.730 |
91.480 |
|
9.466 |
9.244 |
9.354 |
T15: 1 f1 f2 f3 f4 f5 f6 f7 f8 |
0.748 |
-0.871 |
-0.068 |
|
102.200 |
103.600 |
102.900 |
|
9.917 |
9.576 |
9.745 |
T16: 1 f1 f2 f3 f4 f5 f6 f7 f8 f9 |
1.008 |
-0.783 |
0.105 |
|
136.400 |
121.700 |
129.000 |
|
10.328 |
9.663 |
9.993 |
T3: 1 f2 |
0.769 |
-1.257 |
-0.253 |
|
108.500 |
115.700 |
112.100 |
|
10.104 |
10.137 |
10.121 |
T17: 1 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 |
1.588 |
-0.874 |
0.347 |
|
204.800 |
163.000 |
183.700 |
|
11.029 |
10.188 |
10.605 |
T5: 1 f3 |
1.118 |
-0.735 |
0.184 |
|
112.900 |
119.100 |
116.000 |
|
10.297 |
10.476 |
10.387 |
T4: 1 x y f2 |
1.408 |
-1.137 |
0.125 |
|
116.400 |
163.900 |
140.300 |
|
10.510 |
10.922 |
10.717 |
T6: 1 x y f3 |
1.518 |
-0.531 |
0.485 |
|
117.900 |
158.400 |
138.300 |
|
10.316 |
11.080 |
10.701 |
Count of measures: Ring1=1260; Ring2=1281; Total=2541 Average Neighborhood Radius: 386.57 m |
(b) Covariance Identification S1 = Nugget effect; S2 = Order 1 Generalized Covariance (G.C.), Scale = 200 m |
Explained/Theorical Variance Ratios |
|
|
|
Generalized covariance |
Mean square error (Q) |
Ring1 |
Ring2 |
Rings |
Jackknife test |
|
|
|
S1 |
S2 |
|
0.701 |
0.959 |
1.007 |
0.984 |
0.985 |
|
|
|
32.380 |
5.021 |
|
0.703 |
0.923 |
1.040 |
0.983 |
0.983 |
|
|
|
41.350 |
0.000 |
|
0.717 |
1.026 |
0.831 |
0.910 |
0.894 |
|
|
|
0.000 |
25.150 |
|
Table 6.
Automatic structure identification. T stand for trial. For identifying the external drift, different linear combinations (T1 to T8) of topographic attributes as auxiliary variables were used: 1. elevation; 2. TRI, 3. aspect, 4. slope, 5. LS; 6. curvature, 7. SPI, 8. TWI.
Table 6.
Automatic structure identification. T stand for trial. For identifying the external drift, different linear combinations (T1 to T8) of topographic attributes as auxiliary variables were used: 1. elevation; 2. TRI, 3. aspect, 4. slope, 5. LS; 6. curvature, 7. SPI, 8. TWI.
(a) Identification of the order k |
|
|
|
|
|
|
|
|
|
|
|
|
Mean Error |
|
Mean Squared Error |
|
Mean Rank |
Trial |
Ring1 |
Ring2 |
Total |
|
Ring1 |
Ring2 |
Total |
|
Ring1 |
Ring2 |
Total |
T1: 1 f1 |
0.525 |
-0.188 |
0.165 |
|
89.660 |
100.500 |
95.150 |
|
89.660 |
100.500 |
95.150 |
T2: 1 f1 f2 |
0.873 |
0.137 |
0.501 |
|
109.100 |
113.500 |
111.300 |
|
109.100 |
113.500 |
111.300 |
T3: 1 f1 f2 f3 |
0.791 |
0.008 |
0.395 |
|
116.000 |
115.900 |
116.000 |
|
116.000 |
115.900 |
116.000 |
T4: 1 f1 f2 f3 f4 |
0.710 |
-0.011 |
0.345 |
|
163.300 |
142.300 |
152.700 |
|
163.300 |
142.300 |
152.700 |
T5: 1 f1 f2 f3 f4 f5 |
0.749 |
0.080 |
0.410 |
|
155.400 |
141.700 |
148.500 |
|
155.400 |
141.700 |
148.500 |
T6: 1 f1 f2 f3 f4 f5 f6 |
0.770 |
0.030 |
0.396 |
|
132.700 |
126.800 |
129.700 |
|
132.700 |
126.800 |
129.700 |
T7: 1 f1 f2 f3 f4 f5 f6 f7 |
0.421 |
0.173 |
0.295 |
|
176.500 |
163.200 |
169.800 |
|
176.500 |
163.200 |
169.800 |
T8: 1 f1 f2 f3 f4 f5 f6 f7 f8 |
0.179 |
0.080 |
0.129 |
|
204.400 |
186.800 |
195.500 |
|
204.400 |
186.800 |
195.500 |
Count of measures: Ring1=3271; Ring2=3343; Total=6614 Average Neighborhood Radius: 493.06 m |
(b) Covariance Identification S1 = Nugget effect; S2 = Order 1 Generalized Covariance (G.C.), Scale = 200 m; S3 = Spline G.C., Scale = 200 m; S4 = Order 3 G.C., Scale = 200 m |
Explained/Theorical Variance Ratios |
|
Generalized covariance |
Mean square error (Q) |
Ring1 |
Ring2 |
Rings |
Jackknife test |
|
S1 |
S2 |
S3 |
S4 |
0.629 |
0.989 |
1.014 |
1.002 |
1.003 |
|
82.470 |
0.826 |
0.000 |
0.000 |
0.629 |
0.987 |
1.016 |
1.002 |
1.003 |
|
84.120 |
0.000 |
0.000 |
0.000 |
0.677 |
0.956 |
0.821 |
0.879 |
0.870 |
|
0.000 |
47.980 |
0.000 |
0.000 |
Table 7.
Results of cross-validation test for the three models: (1) trend includes only the clay content as auxiliary variable, (2) the trend includes only elevation, and (3) no trend. RMSSE is the root mean squared standardized error by standard deviation of kriging; r is the correlation coefficient between observation and estimation; ρ is the correlation coefficient between standardized error and estimation.
Table 7.
Results of cross-validation test for the three models: (1) trend includes only the clay content as auxiliary variable, (2) the trend includes only elevation, and (3) no trend. RMSSE is the root mean squared standardized error by standard deviation of kriging; r is the correlation coefficient between observation and estimation; ρ is the correlation coefficient between standardized error and estimation.
Model |
Mean error |
RMSSE |
r |
ρ |
1 |
-0.0215 |
0.97 |
0.78 |
0.02 |
2 |
0.0252 |
0.90 |
0.38 |
0.02 |
3 |
-0.1398 |
1.13 |
0.35 |
0.16 |