Preprint
Article

Development of Soil Erosion Susceptibility Model Using UAV Photogrammetry in a Timber Harvesting Area, South Korea

Altmetrics

Downloads

124

Views

49

Comments

0

This version is not peer-reviewed

Submitted:

26 September 2023

Posted:

27 September 2023

You are already at the latest version

Alerts
Abstract
Unmanned aerial vehicle (UAV) systems are widely used in many forest-related fields owing to their cost-intensive and precise surveying technology. This study classified erosion susceptibility (ES) in a timber harvesting area using machine learning (ML) and statistical approaches. In dataset generation for the training and testing process, the digital surface model (DSM) of difference (DoD) for July–June was utilized as a dependent variable, and six terrain maps of the DSM for June were used as independent variables. The ES threshold was set at 5 cm for the binary classification of ES pixels while processing using ML (e.g., random forest and extra gradient boost [XGB]) and statistical (e.g., logistic regression) algorithms for model development. The overall accuracy (OA), receiver operating characteristics, and area under the curve (AUC) were calculated for model accuracy and validation. Although the AUC of all models did not appear acceptable (AUC > 0.7), the XGB model showed the best performance regarding time duration, OA, and AUC by 2 h, 64%, and 0.63, respectively. Despite the low AUC and accuracy of the XGB model, the wheel tracks and edges of the operation road were determined to be susceptible areas in the ES map of the XGB.
Keywords: 
Subject: Environmental and Earth Sciences  -   Environmental Science

1. Introduction

Unmanned aerial vehicle (UAV) surveys have demonstrated the efficiency and precision of data acquisition in many forest-related fields [1,2,3,4]. However, only a few studies have conducted UAV surveys for the environmental analysis of forest soil [5,6,7]. These studies detected and monitored soil surface deformation (SSD) in timber harvesting areas. Despite the application of UAV-sensor in forestry, developing a prediction model for micro-SSDs has not been attempted in forest environmental fields, unlike in biomass prediction studies [8,9,10]. Like macro-SSD studies, predicting macro-SSD, such as landslides, remains challenging in many recent landslide susceptibility studies [11,12,13].
Recent study revealed that the UAV system has a capability of detecting SSD in timber harvesting area [7]. The UAV photogrammetric approach was utilized to acquire point cloud data (PCD). 2D images were collected monthly by UAV surveys and were processed in structure from motion (SfM). The airborne PCD were geo-referenced to align all acquired PCD by correcting the coordinate data of stump installed ground control points (GCPs). Hence to the PCD alignment, the elevation differences of each pixel w calculated by subtracting each monthly acquired digital surface models (DSMs). The height difference indicated the SSD that acquired in the timber harvesting area, and 3D SSDs were validated precisely with ground truths to monitor the seasonal effects of timber harvesting.
Factors such as soil texture, precipitation, vegetation, topography, and surface cover are related to displacement generation, as revealed by the revised universal soil loss equation utilized in studies [14,15]. Despite the well-known factors of SSD occurrence, modeling studies have focused on predicting landslides at the meter-class digital elevation model (DEM) resolution, which is inappropriate for detecting and analyzing micro-SSD in three-dimensional (3D) images [11,16,17,18]. In landslide studies, for decades, the landslide susceptibility model has been analyzed using data derived from remote sensing platforms such as satellites and aircrafts [19,20,21]. These previous landslide studies sampled landslides from 3D images derived from remote sensing technology, which were used for independent variables such as terrain, vegetation, and land use maps to classify landslide-susceptible areas from meter-class resolution images. Terrain variables consist of slopes, aspects, topographic wetness index (TWI), terrain roughness index (TRI), and curvatures related to the physical rainfall energy resulting in landslides [22,23,24]. vegetation covering the soil surfaces is also related to landslide occurrence by mitigating rainfall energy and was used as the normalized difference vegetation index (NDVI) in the landslide susceptibility models [24,25]. Moreover, classification maps such as soil texture and land use were utilized as nominal variables to assess erosion vulnerability caused by human activity, which may accelerate landslides.
Most preceding gully erosion and landslide susceptibility modeling studies selected algorithms such as machine learning (ML) and statistical algorithms for classification processes [22,24,26,27,28,29,30]. Landslide modeling studies have utilized landslide samples for supervised learning, such as random forest (RF) [27,31], extra gradient boost (XGB) [32], logistic regression (LR) [30], support vector machine [26], and artificial neural network (ANN) [22,24,28]. During the classification process, the sampled landslide cells were trained and tested as target data, also known as independent variables, and the variables from the terrain, vegetation, and cover maps were prepared and tested as dependent variables. Moreover, to overcome the overfitting issues of the micro-target data, cross-validation methods were utilized in the training and testing processes to enhance the classification performance. Subsequently, the hyperparameters were tuned to develop a model suitable for the dataset. The overall accuracy (OA) was calculated from the classification results, which were derived from a confusion matrix, and the verification of the model performance was scored with F1 and the receiver operating characteristic area under the curve (ROC-AUC) [21,27,31]. Related studies mapped the classification results to a GIS environment in the final stages.
Despite the challenging tasks of detecting micro-SSD from UAV-derived data [7], we performed further steps to classify the erosion susceptibility (ES) pixels from the UAV photogrammetric DSM. In this study, we investigated the feasibility of classifying micro-SSD from 3D soil surface data by developing a classification model with an ES threshold of 5 cm. Moreover, the training and testing of millions of 3D data were optimized by tuning the amounts and splitting datasets in the stratified cross-validation (SCV) process. Finally, the performances of the developed models were compared and subsequently mapped to the best-performing model.

2. Materials and Methods

2.1. Over All Process of the Study

The goal of this study is to develop an ES model from UAV photogrammetry of timber harvesting effect. To develop the ES model, a schematic of the study is presented in this section (Figure 1). The UAV surveys were conducted monthly to collect 2D images for UAV photogrammetry in post-timber harvesting area. Monthly PCD were built in structure by motion (SfM) algorithm, hence GCP-based geo-referencing was preprocessed to align each of monthly PCD for DoD calculations. DSMs of each month were then generated and elevation differences of each DSMs calculated to target data of the ES model. DSM of pre-month was used to generate the terrain features for predicting target data which is the ES pixels in the dataset. The dataset was trained and tested on classification algorithms such as XGB, RF, and LR, and the model performances were calculated as OA, F1, and AUC. The best ES model were selected by comparing each model performances and mapped to identify the ES of the timber harvesting area.

2.2. Study Area

The timber harvesting site, situated in the research forest of Kangwon National University, Republic of Korea (37°46′34.4′′ N, 127°49′41.1′′ E; Figure 2), was cleared before employing UAV photogrammetry to detect SDD in the canopy-opened site. In March 2022, timber was harvested on a 3-ha total area. The area has a temperate climate, with summer (June to August) being the wettest season. During the study period, the monthly precipitation averaged 300 mm during the heavy rainy season, and according to the climate, the wettest months were June through August. With elevations ranging from roughly 508 to 628 m above sea level, the site has an average slope of 47%. According to American soil taxonomy, the region's soils are dark brown sandy loam, and the soil type corresponds to the Mui series (coarse loamy, mixed, Typic Humudepts). Logging trails were formed on the center and right sides of the study area. The area's surface is covered in rocks, logging waste, forest soil, and a little less flora from March through June. The steep slopes and abundant rainfall at the research site provide optimal conditions for runoff and SSD.

2.3. 2D Image Collection from UAV System on Timber Harvesting Site

Prior to conducting UAV surveys, GCP installation must take place in order to geo-reference the 3D data. Due to difficulty in finding undeforming objects in the forests, we prepared 40 × 40 cm Fomex texture plate for GCPs to install at stumps. A total of 29 GCPs were installed on the recognizable stumps, hence the Trimble R12i GNSSs collected the center coordinates of the installed GCPs (Figure 3). Parallel to the GCP installation, we also prepared ruler attached polyvenyl chloride pipes for VPs. A total of 24 VPs were installed to validate the DoD which is used for independent variable in the ML models.
The aerial 2D images were collected using Matice 300 (7 kg) of Da Jiang industry (DJI) for platform and Zemuse H20T (0.82 kg) for the sensor, respectively. To avoid risks of crashing while the airborne 2D image collection without real-time kinematic, a GPS-based vertically parallel flight method was performed to acquire high-resolution DSM of the steep slopes (Kim et al., 2023). The overlaps, margin, flight speed for automatic UAV flights were set at 90% for side, 80% for front, 5 m/sec, respectively. Especially, considering the slope degree of the study area, the flight heights were set at 100 m and 140 m, respectively. All of the UAV surveys were able to collect over 200 images per surveys for the total study site.

2.4. Variable Generation for ES Model

To analyze 3D data using a ML model, it is necessary to generate terrain variables from the 3D DSM. In previous gully erosion and landslide susceptibility studies, terrain variables were generated from space- and airborne-derived DEMs, and the models developed from preceding studies were able to accurately classify macro-SSD (e.g., gully erosion and landslides) [21,22,28,33,34].
However, the classification modeling process in this study should be performed with centimeter-class resolution to classify micro-SSDs. Thus, the UAV photogrammetric method was utilized to acquire centimeter-class resolution. In this process, all images collected by the automatic flight method were aligned with embedded coordinate data from the collected images in Agisoft Metashape Professional version 1.5.1 (Agisoft LLC., Petersburg, Russia). Hence, images generate the PCD using the structure from the motion (SfM) algorithm. The Metashape software provides photogrammetric options that can be customized for each process (Table 1).
In photo alignment, feature points calculated the correlation between each image through SfM, and tie points and depth maps were generated. In dense cloud generation, PCD are generated from tie points.
Each monthly acquired PCD must be aligned at the lowest possible spatial root mean square error (RMSE) for the SSD calculation. In landslide-related UAV studies, GCPs have been used to geo-reference XYZ coordinates in PCD [35,36,37]. The coordinates collected from the centers of all 29 GCPs using global navigation satellite system (GNSS) surveyed during fieldwork were imported into the PCD in shapefile format. These GCPs were used to apply the coordinate data and validate the spatial error throughout the process.
Height differences per pixel were calculated using ArcGIS Pro (ESRI Inc., Redlands, CA, USA) from the spatially aligned DSMs. In ArcGIS Pro, the raster calculator tool was used to calculate the Z (height) values. In this method, the pixels from the pre-DSM and DSM on June 10, 2022 were subtracted from the pixels of the post-DSM DSM on July 9, 2022, and the resolution was considered during the calculation.
As the DSM of the difference (DoD) was calculated from the height differences at each pixel, validation from the DoD was required. To validate the precision of the SSD in DoD, the height values at each installed validation point (VP) were compared to the point values that appeared in the DoD file. The coordinate data acquired from the VPs by GNSS was imported as points in shapefile format on the DoD map. The pixel values calculated using the DoD method were then compared with the ground truth data, which were recorded from the field survey and were calculated as the RMSE for their assessments. Moreover, the data conditions of slope degree, precipitation, and alignment error revealed that the most precise and understandable average erosion of the total monitoring was the average erosion height from the DoD in October–September. The threshold of the ES height for reclassification was set at 5 cm, which resulted from the average erosion level in the DoD during October–September [7].
To predict ES from terrain 3D data, the DSM must be converted to its morphological features for analysis. This process was conducted using ArcGIS Pro, and slope, aspect, TWI, profile curvature (PRC), plan curvature (PLC), and TRI were calculated using the DSM. However, unlike other independent variables calculated using ArcGIS tools, the TWI and TRI should be generated using semi-manual calculations. The TWI indicates the water flow on the upper slopes, which may contribute to erosion. Therefore, the equation for TWI is as follows:
T W I = ln α tan β ,
where α is the contributing upper slope pixels and β is the slope gradient of the neighboring pixels of the slope.
Moreover, the TRI indicates the morphological influence of water flow. In ArcGIS, the maximum, mean, and minimum of each input pixel by neighboring pixels of the input pixels in DSM were antecedently calculated using the “focal statistics tool.” Hence, the TRI equation is as follows:
T R I = F S M e a n F S M i n i m u m F S M a x i m u m F S M i n i m u m ,
where F S M e a n is the mean elevation of focal statistics, F S M i n i m u m is the minimum elevation of focal statistics, and F S M a x i m u m is the maximum elevation of the focal statistics.
All the terrain variables were masked and adjusted appropriately for exact pixel numbers and values at the appropriate pixel locations. Subsequently, all maps were exported in the raster format to generate the data frame.
Georeferencing was conducted using GCPs in the Metashape environment. Initially, the GCPs were manually registered in the PCD, and the program automatically selected the corresponding 2D images. The exact centers of the GCPs were manually selected from the 2D images. The center points were manually adjusted and deleted, where relevant, before calculating the GCP centers (distance in cm), which represented the GCP georeference errors from June, July, September, and October.

2.5. Development of the ES Model Using a Classification Algorithm

All dependent and independent variable maps were uploaded to the R environment for stacking. The stacked pixel data from the maps was extracted and transformed into the CSV format. Moreover, the datasets for ES analysis were tested by setting an ES threshold of 5 cm for each pixel of the data. The datasets generated from the stacking maps were transferred to the Python 3.10.6 environment for faster analysis using the Numpy package. In total, 34,088,223 stacked pixel data points were imported from the total dataset (10% of the total data), scaled using the min-max method, and randomly sampled without replacement for training and testing. The sample data were trained and tested using the SCV in each of the three classification algorithms during the training and testing processes. First, in the SCV process, three sets of trials (5, 10, and 100 splits) were trained and tested using the ML- and statistic-based models. Subsequently, classification models were developed, and the performance of each model was verified using precision and recall values. In the binary classification process, ML must include an algorithm for classification. To classify a large amount of 34-million-pixel DoD SSD data, which is convolutional and has small terrain features in contrast to landslides, strong and precise ML algorithms such as RF and XGB are required. In addition, to assess the applicability of ML models in ER analysis, a statistical algorithm, LR, was used to compare the performance of ML-based models.
Table 2. Hyperparameters utilized in the classification models.
Table 2. Hyperparameters utilized in the classification models.
XGB RF LR
Max depth 5 Max depth 100 C 100
N estimators 1000 Max_features 3 Max_iteration 500
Learning rate 0.1 Min_samples_leaf 5 Solver liblinear
Min_child_weight 1 Min_samples_split 12 Penalty l1
subsample 0.8 N estimators 500
colsample_bytree 0.8
objective binary: logistic

2.6. Assessments of the ES Models

To evaluate and compare model performances, universal indicators such as F1 and ROC-AUC were used for the classification results [38,39,40]. The OA, recall, and precision were calculated using the confusion matrix derived from the classification results (Table 3). OAs were calculated as the sum of true negatives (TN) and true positives (TP) divided by the sum of TN, TP, false negatives (FN), and false positives (FP). Recall and precision can be calculated as F1 scores from the harmonic means. The ROC was calculated for each classification result and performance using line plots to confirm whether the results were fitted correctly. The true positive rate (TPR) and false positive rate (FPR) were used to plot the ROC plot with a size of 1 × 1, and the AUC was calculated to determine the success rate of the ES models. Finally, the total classification results were applied to the coordinate data and mapped using ArcGIS Pro for visualization.

3. Results

3.1. Dataset for Training and Testing

From the UAV surveys conducted on June 10 and July 9, 235 and 233 2D images were collected, respectively. The PCD for June and July were generated from the UAV-derived 2D images via a photogrammetric process using SfM. These PCD were geo-referenced using 29 GCPs to align the June and July DSM and then filtered manually using the Agisoft software. The alignment of each PCD was processed from the GCP centers of the PCD with a spatial error of 11.1 cm. In the final data processing, DSMs of <2.7-cm resolution were generated from the preprocessed PCD for June and July (Figure 3). From the validation of the DoD map, the precision of the SSD was calculated for 24 VPs with an RMSE of 8.8 cm, which was the difference between the 3D data and ground truth measurements.
The DoD of Jul-Jun were firstly calculated as numerical data from the process, hence, to classify in the algorithm, the maps were reclassified as according to the threshold of 5cm (Figure 4 and Figure 8b). The ES model used the reclassified DoD as the dependent variable. Parallel to the independent variable generation, six terrain variables (slope, aspect, TWI, PRC, PLC, and TRI) were generated from the DSM of June and stacked for ES analysis (Figure 5).

3.2. Model Comparisons

Classifications were performed with 10,000 samples during training and testing attempts using 5, 10, and 100 split sets of SCV. In the 5-split SCV sets, the XGB model could not properly classify ES cells from the total target data. Moreover, because none of the ES cells accounted for most of the target data, overfitting issues occurred mainly in classifying ES cells as none. These issues constantly occurred in the 10 split SCV sets, where the early classification performance from the AUC was lower than 0.6, and the performance was not enhanced at the end of the SCV process. Furthermore, the 100-split SCV set showed that the final model performances were slightly improved by 0.03 compared with the 5 split SCV set.
The SCV results of XGB from 10% of the total dataset showed significantly different performances from the 10,000 samples, and the AUC scores were enhanced by almost 0.1. Moreover, the AUC results from the 5-, 10-, and 100-split SCV sets showed no significant differences in model performance; thus, the SCV process was almost meaningless (Figure 6).
The 10% sample dataset of RF and LR also showed a significant difference in AUC compared to the 10,000 samples. The AUC from 5, 10, and 100 splits of both models showed no significant difference in the training and testing of each 10% of the sampled data (Figure 7). However, the processing durations of RF and LR were significantly slower than that of the 10,000-samples model at 14 h and 21 h, respectively.
The OAs of all models were over 64% for classification accuracy; however, by confirming the F1 score, overfitting issues were found in the developed models. The confirmed F1 scores for the RF, XGB, and LR groups were 0.53, 0.54, and 0.49, respectively. Based on the confirmed issues with the confusion matrix, the ES values were classified as much lower than the non-ES values, revealing that all models had difficulties in classifying erosion-occurring cells. Despite classification accuracy of 64% at LR model, it was confirmed that the LR model had a lower F1 score than did the RF and XGB models. The SCV durations for each model were 14, 2, and 21 h for RF, XGB, and LR, respectively. As expected, XGB had the shortest duration for the total analysis, whereas LR had the longest. Moreover, the differences in AUC between XGB and RF and XGB and LR were 0.03 and 0.07, respectively (Figure 6a and Figure 7). XGB was confirmed appropriate for ES analysis and mapping by comparing the three developed models.

3.3. ES Mapping of the Study Site

Based on the model performance, mapping was conducted with the best ES model—XGB. Because overfitting issues were focused on none of the ES cells, none of the ES areas were determined more from XGB than the reference map, which is the DoD (Figure 5). The lower slope of the site was determined by the much smaller ES area than that in the reference map. However, the XGB model could determine the ES area on the edges of the operational roads in the center and on the left side (Figure 8a). The left sides of the map were also determined to be ES areas where the surfaces were covered by rocks (Figure 8a). The total ES area was calculated to be approximately 40 m2, which was different from the reference erosion area of approximately 104 m2 (Figure 8). Furthermore, in the ES analysis of the micro-SSD, XGB could determine the wheel tracks detected on the reference DoD (Figure 9).
Figure 8. Comparison of extra gradient boost model (XGB) and target data; DoD by mapping; (a) mapping of enhanced XGB model; (b) DoD map of ES threshold at 5 cm.
Figure 8. Comparison of extra gradient boost model (XGB) and target data; DoD by mapping; (a) mapping of enhanced XGB model; (b) DoD map of ES threshold at 5 cm.
Preprints 86190 g008

4. Discussion

4.1. 3D Surface Model Generated from the UAV System

Herbaceous plants grew throughout the monitoring period. Despite the comparable alignment errors in the DSMs from July and June to the steep slope studies, the verifications of SSDs at each plot were only available in the DoD from July to June because of vegetation growth. Moreover, distortion occurred in the DSM on July 9, which was shown as a geo-referencing error of 23 cm and was revealed to be relatively high compared to the geo-referencing error of 12 cm on June 10 [7]. These spatial issues may not be comparable to the landslide susceptibility studies because the sampling methods of those studies were not derived from the DoD but from landslide cells [24,41]. Thus, this relatively high spatial error should have caused the AUC to remain at 0.6, even in the 100 -split SCV set tests with 10% of the total data. Moreover, we assume that DoD quality is essential for enhancing model performance.
The use of independent variables in this study was based on a case study of landslide susceptibility. In a LiDAR data-based landslide susceptibility case study, variables were generated from a LiDAR-derived DEM [22]. The DEM generated six terrain-independent variables: slope, solar radiation, PRC, PLC, TWI, and upslope drainage area. It was difficult to compare the performances with validated scores (i.e., AUC), similar to the landslide susceptibility modeling studies, because the AUC from the ANN model was not shown in this study. However, this approach may be called the DEM and DSM generated from the PCD because the LiDAR and photogrammetric data have the same features, resulting in a digital surface model. Moreover, terrain-independent variables were also generated from the 3D model, comparable to the independent variables in this study.
In contrast to this study, recent studies [25,27,31,42,43,44] utilized more 3D maps of vegetation features (NDVI), land cover features (land use), classified features (precipitation), and distance maps (distance from water sources). However, owing to the classification of the ES of the 2.5-ha area of the photogrammetric data, utilizing the vegetation, land cover, and distance features has limitations in generating features from 2D images. Moreover, it is impossible to generate precipitation features because the measurement from the station covers almost a kilometer, which is incomparably over-scaled from our site area of 2.5 ha.

4.2. Performances from ES Models

The XGB models could not perform the classification well compared to the landslide susceptibility models categorized in the ML approach study [40]. The AUC performances from the recent landslide study were presented as AUC and OA of 0.86 and 78%, respectively, for the k-nearest neighbor model; AUC and OA of 0.87 and 79%, respectively, for the ANN model; and AUC and OA of 0.89 and 80%, respectively, for the RF model, which are comparably higher than those of our XGB models. However, these preceding models consisted of 4–18 more variables than did the XGB model. Furthermore, the variables used in the previous study were mainly NDVI, land cover, lithology, distance maps from roads, and hydrology. However, the UAV photogrammetry process used in this study did not generate these variables. Thus, it is more appropriate to compare our attempts with the terrain variables derived from the LiDAR-DEM study [22,45]. The DEM-adopted studies attempted to investigate landslide susceptibility using slope, TWI, PRC, PLC, and solar radiation variables. The model performance is not shown for the variables derived from the LiDAR-DEM.
Despite the resolution of original data used in the ML studies were meter-class, ML-based spatial classification studies were not able to perform excellent validation scores [22,45,46]. Thus, the resolution of the 3D data may not be the reason for low model performances. However, according to Lidberg, W.; Nilsson, M.; Ågren, A [46] which varied the resolution of variables and developed model showed significant difference on variable importance. The study showed that the 24 m resolution DEM-based TWI impacted the OA of the model by ~50%, while 48 m resolution DEM-based TWI showed 30% which is lower than the 24 m resolution DEM-based TWI. To compare our model to Lidberg, W.; Nilsson, M.; Ågren, A [46], the results may indicate that the variables derived from 3 cm class DSM used in this study may have impacted the model performances. We suggest that further researches should be conducted to confirm the applicability of micro resolution DEM derived variables on training and testing the high-resolution ES models.
Consequently, the research direction of the ES analysis in this study was appropriate because the terrain variable selections and generations were similar to those in previous studies. Despite utilizing the SCV method, the limitation of this study was that the variables used in the classification process were insufficient for analyzing ES (AUC of 0.63). Thus, the AUC of XGB did not correspond to the acceptable model standard, which required an AUC of 0.7 or higher. It may also be speculated that the number of target ES cells was too small to be trained and tested using the algorithm, even when a powerful algorithm was used in the model.
The LR model, which is a statistical model, could not properly process binary classification from the trained target data. The average AUC from the LR model was 0.56, a poor model for ML studies [40,43]. This result may be attributed to the characteristics of the LR algorithm. Previous ML studies revealed that single-regulated training and testing in the LR algorithm may not be appropriate for verifying a large dataset. The RF models, which are ML models, could process binary classification better than the LR models because of the multi-decision tree process in the RF algorithm. In most RF studies using GIS, the bagging process enhances model performance through training and testing several times by tuning the main hyperparameter and the number of trees [47,48,49]. However, comparing the LR and RF performances from previous studies, the RF results showed better performances than the LR results, with a maximum of 15% and 0.06 at the OA and AUC, respectively [49,50,51]. From this and previous studies, it may be assumed that the LR algorithm processes the dataset only once by regulating the classification process in the sigmoid function. Based on this characteristic of the LR algorithm, we assumed that the LR model has a theoretical limitation in classifying a large spatial dataset compared with the RF model.
The duration issues were probably due to the characteristics of the RF and LR algorithms. In the theoretical approach of each algorithm in RF studies, the algorithm processes a multi-decision tree at each bagging process without a boosting process, unlike the XGB model. Moreover, previous studies have shown that XGB has better model performance than RF. Thus, in comparison with this study and related studies, we assumed that the boosting process in the XGB algorithm may have resulted in better performance in terms of the duration and AUC of the XGB model than those of the RF model in large spatial data processing.

4.3. Erosion Susceptible Area Classified by the XGB Model

Here, we were able to perform the whole process of data acquisition using UAV photogrammetry to prediction of ES using ML-based classification models. The threshold was based on the average erosion of a precise DoD map from September to October [7]. In previous research, the DoD map for September-October was acquired with the lowest alignment error of 3 cm using 24 GCP centers. Moreover, the reliable average 1-month erosion level was calculated as 5.13 cm on the total slope. Thus, a threshold of 5 cm was appropriately considered based on the DoD for September–July [7]. The XGB model properly analyzes and detects high ES regions related to topographic features similar to landslide models [22,27,52]. However, the AUC of our XGB model was lower than that of previous studies. We assumed that spatial distortion occurred during the data acquisition in July.
Moreover, the training and testing samples were derived from the total area, which differed from the landslide samples. The sampling method used to classify landslide susceptibility was based on selecting areas where landslides occurred and non-occurring regions, which differs from that used in this study. Thus, we postulate that the sampling method can be improved by selecting specific erosion-occurring pixels for appropriate samples in future studies.

5. Conclusions

In this study, we mainly investigated the feasibility of classifying ES, similar to the results from the DoD. Although the DoD was calculated from the high alignment error of the DSM to develop the ES model, the ML models could classify the ES from the terrain variables to the target data. The ML models, XGB and RF, performed better than the LR statistical model when comparing the AUC and time duration results. However, utilizing the SCV method for classifying the target data did not enhance the model at an acceptable performance level. Moreover, despite using 10% of the total data, the AUCs from XGB and RF were only over 0.6, which is unsuitable for classification model performance and should be enhanced for necessary applications. Regardless of the confirmed issues, XGB mapping sensitively classified the actual ES area of the wheel tracks and the edges of the logging tracks at the study site. In future work, we assume the model should be enhanced by adding vegetation maps from multispectral sensors and a lithology map from soil texture analysis. Moreover, the target data should be significantly improved by utilizing RTK to reduce alignment errors and distortions or enhanced sensors such as LiDAR.

Author Contributions

Conceptualization, B.C. and J.K.; Methodology, B.C., J.K., and I.K.; Validation, B.C., J.K., and I.K.; Formal analysis, B.C. and J.K.; Investigation, B.C., I.K., and J.K.; Resources, B.C. and I.K.; Data curation, B.C., J.K, and I.K.; Writing-Original Draft Preparation, B.C. and J.K.; Writing-Review and Editing, J.K., I.K., and B.C.; Visualization, J.K, and I.K.; Supervision, B.C.

Funding

This research was supported by the ‘R&D Program for Forest Science Technology (Project No. 2021367B10-2323-BD01)’ and the ‘R&D Program for Forest Science Technology (Project No. 2019151D10-2323-0301)’ of the Korea Forest Service (Korea Forestry Promotion Institute).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Torresan, C.; Berton, A.; Carotenuto, F.; Di Gennaro, S.F.; Gioli, B.; Matese, A.; Miglietta, F.; Vagnoli, C.; Zaldei, A.; Wallace, L. Forestry applications of UAVs in Europe: A review. Int. J. Remote Sens. 2017, 38, 2427–2447. https://doi.org/10.1080/01431161.2016.1252477. [CrossRef]
  2. Sankey, T.; Donager, J.; McVay, J.; Sankey, J.B. UAV lidar and hyperspectral fusion for forest monitoring in the southwestern USA. Remote Sens. Environ. 2017, 195, 30–43. https://doi.org/10.1016/j.rse.2017.04.007. [CrossRef]
  3. Hyyppä, E.; Hyyppä, J.; Hakala, T.; Kukko, A.; Wulder, M.A.; White, J.C.; Pyörälä, J.; Yu, X.; Wang, Y.; Virtanen, J.; et al. Under-canopy UAV laser scanning for accurate forest field measurements. ISPRS J. Photogramm. 2020, 164, 41–60. https://doi.org/10.1016/j.isprsjprs.2020.03.021. [CrossRef]
  4. Schiefer, F.; Kattenborn, T.; Frick, A.; Frey, J.; Schall, P.; Koch, B.; Schmidtlein, S. Mapping forest tree species in high resolution UAV-based RGB-imagery by means of convolutional neural networks. ISPRS J. Photogramm. 2020, 170, 205–215. https://doi.org/10.1016/j.isprsjprs.2020.10.015. [CrossRef]
  5. Nevalainen, P.; Salmivaara, A.; Ala-Ilomäki, J.; Launiainen, S.; Hiedanpää, J.; Finér, L.; Pahikkala, T.; Heikkonen, J. Estimating the rut depth by UAV photogrammetry. Remote Sens. 2017, 9, 1279. https://doi.org/10.3390/rs9121279. [CrossRef]
  6. Talbot, B.; Rahlf, J.; Astrup, R. An operational UAV-based approach for stand-level assessment of soil disturbance after forest harvesting. Scand. J. Forest Res. 2018, 33, 387–396. https://doi.org/10.1080/02827581.2017.1418421. [CrossRef]
  7. Kim, J.; Kim, I.; Ha, E.; Choi, B. UAV photogrammetry for soil surface deformation detection in a timber harvesting area, South Korea. Forests. 2023, 14, 980. https://doi.org/10.3390/f14050980. [CrossRef]
  8. Domingo, D.; Ørka, H.O.; Næsset, E.; Kachamba, D.; Gobakken, T. Effects of UAV image resolution, camera type, and image overlap on accuracy of biomass predictions in a tropical woodland. Remote Sens. 2019, 11, 948. https://doi.org/10.3390/rs11080948. [CrossRef]
  9. De Almeida, D.R.A.; Broadbent, E.N.; Ferreira, M.P.; Meli, P.; Zambrano, A.M.A.; Gorgens, E.B.; Resende, A.F.; de Almeida, C.T.; do Amaral, C.H.; Corte, A.P.D.; et al. Monitoring restored tropical forest diversity and structure through UAV-borne hyperspectral and lidar fusion. Remote Sens. Environ. 2021, 264, 112582. https://doi.org/10.1016/j.rse.2021.112582. [CrossRef]
  10. Brede, B.; Terryn, L.; Barbier, N.; Bartholomeus, H.M.; Bartolo, R.; Calders, K.; Derroire, G.; Krishna Moorthy, S.M.; Lau, A.; Levick, S.R.; et al. Non-destructive estimation of individual tree biomass: Allometric models, terrestrial and UAV laser scanning. Remote Sens. Environ. 2022, 280, 113180. https://doi.org/10.1016/j.rse.2022.113180. [CrossRef]
  11. Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci. Rev. 2020, 207, 103225. https://doi.org/10.1016/j.earscirev.2020.103225. [CrossRef]
  12. Tempa, K.; Peljor, K.; Wangdi, S.; Ghalley, R.; Jamtsho, K.; Ghalley, S.; Pradhan, P. UAV technique to localize landslide susceptibility and mitigation proposal: A case of Rinchending Goenpa landslide in Bhutan. Nat. Hazards Research. 2021, 1, 171–186. https://doi.org/10.1016/j.nhres.2021.09.001. [CrossRef]
  13. Cao, C.; Zhu, K.; Xu, P.; Shan, B.; Yang, G.; Song, S. Refined landslide susceptibility analysis based on InSAR technology and UAV multi-source data. J. Cleaner Prod. 2022, 368, 133146. https://doi.org/10.1016/j.jclepro.2022.133146. [CrossRef]
  14. Ganasri, B.P.; Ramesh, H. Assessment of soil erosion by RUSLE model using remote sensing and GIS—A case study of Nethravathi Basin. Geosci. Front. 2016, 7, 953–961. https://doi.org/10.1016/j.gsf.2015.10.007. [CrossRef]
  15. Ghosal, K.; Das Bhattacharya, S. A review of RUSLE model. J. Indian Soc. Remote Sens. 2020, 48, 689–707. https://doi.org/10.1007/s12524-019-01097-0. [CrossRef]
  16. Lee, S.; Talib, J.A. Probabilistic landslide susceptibility and factor effect analysis. Environ. Geol. 2005, 47, 982–990. https://doi.org/10.1007/s00254-005-1228-z. [CrossRef]
  17. Catani, F.; Lagomarsino, D.; Segoni, S.; Tofani, V. Landslide susceptibility estimation by random forests technique: Sensitivity and scaling issues. Nat. Hazards Earth Syst. Sci. 2013, 13, 2815–2831. https://doi.org/10.5194/nhess-13-2815-2013. [CrossRef]
  18. Chen, Z.; Ye, F.; Fu, W.; Ke, Y.; Hong, H. The influence of DEM spatial resolution on landslide susceptibility mapping in the Baxie River basin, NW China. Nat. Hazards. 2020, 101, 853–877. https://doi.org/10.1007/s11069-020-03899-9. [CrossRef]
  19. Weirich, F.; Blesius, L. Comparison of satellite and air photo based landslide susceptibility maps. Geomorphology. 2007, 87, 352–364. https://doi.org/10.1016/j.geomorph.2006.10.003. [CrossRef]
  20. Effat, H.A.; Hegazy, M.N. Mapping landslide susceptibility using satellite data and spatial multicriteria evaluation: The case of Helwan District, Cairo. Appl. Geomat. 2014, 6, 215–228. https://doi.org/10.1007/s12518-014-0137-9. [CrossRef]
  21. He, Q.; Xu, Z.; Li, S.; Li, R.; Zhang, S.; Wang, N.; Pham, B.T.; Chen, W. Novel entropy and rotation forest-based credal decision tree classifier for landslide susceptibility modeling. Entropy (Basel). 2019, 21, 106. https://doi.org/10.3390/e21020106. [CrossRef]
  22. Gorsevski, P.V.; Brown, M.K.; Panter, K.; Onasch, C.M.; Simic, A.; Snyder, J. Landslide detection and susceptibility mapping using LiDAR and an artificial neural network approach: A case study in the Cuyahoga Valley National Park, Ohio. Landslides. 2016, 13, 467–484. https://doi.org/10.1007/s10346-015-0587-0. [CrossRef]
  23. Różycka, M.; Migoń, P.; Michniewicz, A. Topographic Wetness Index and Terrain Ruggedness Index in geomorphic characterisation of landslide terrains, on examples from the Sudetes, SW Poland. Z. Geomorphol. Suppl. 2017, 61, 61–80. https://doi.org/10.1127/zfg_suppl/2016/0328. [CrossRef]
  24. Huang, F.; Chen, J.; Du, Z.; Yao, C.; Huang, J.; Jiang, Q.; Chang, Z.; Li, S. Landslide susceptibility prediction considering regional soil erosion based on machine-learning models. ISPRS Int. J. Geo Inf. 2020, 9, 377. https://doi.org/10.3390/ijgi9060377. [CrossRef]
  25. Zhang, T.; Fu, Q.; Wang, H.; Liu, F.; Wang, H.; Han, L. Bagging-based machine learning algorithms for landslide susceptibility modeling. Nat. Hazards. 2022, 110, 823–846. https://doi.org/10.1007/s11069-021-04986-1. [CrossRef]
  26. Pham, B.T.; Tien Bui, D.; Prakash, I.; Nguyen, L.H.; Dholakia, M.B. A comparative study of sequential minimal optimization-based support vector machines, vote feature intervals, and logistic regression in landslide susceptibility assessment using GIS. Environ. Earth Sci. 2017, 76, 1–15. [CrossRef]
  27. Maxwell, A.E.; Sharma, M.; Kite, J.S.; Donaldson, K.A.; Thompson, J.A.; Bell, M.L.; Maynard, S.M. Slope failure prediction using random forest machine learning and LiDAR in an eroded folded mountain belt. Remote Sens. 2020, 12, 486. https://doi.org/10.3390/rs12030486. [CrossRef]
  28. Roy, P.; Chakrabortty, R.; Chowdhuri, I.; Malik, S.; Das, B.; Pal, S.C. Development of different machine learning ensemble classifier for gully erosion susceptibility in Gandheswari Watershed of West Bengal, India. In Machine Learning for Intelligent Decision Science, 2020; pp. 1–26. https://doi.org/10.1007/978-981-15-3689-2_1. [CrossRef]
  29. Shano, L.; Raghuvanshi, T.K.; Meten, M. Landslide susceptibility evaluation and hazard zonation techniques–a review. Geoenviron. Disasters. 2020, 7, 1–19. [CrossRef]
  30. Kuradusenge, M.; Kumaran, S.; Zennaro, M. Rainfall-induced landslide prediction using machine learning models: The case of Ngororero District, Rwanda. Int. J. Environ. Res. Public Health. 2020, 17, 4147. https://doi.org/10.3390/ijerph17114147. [CrossRef]
  31. Shirvani, Z. A holistic analysis for landslide susceptibility mapping applying geographic object-based random forest: A comparison between protected and non-protected forests. Remote Sens. 2020, 12, 434. https://doi.org/10.3390/rs12030434. [CrossRef]
  32. Can, R.; Kocaman, S.; Gokceoglu, C. A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam, Turkey. Appl. Sci. 2021, 11, 4993. https://doi.org/10.3390/app11114993. [CrossRef]
  33. Oh, H.J.; Lee, S. Shallow landslide susceptibility modeling using the data mining models artificial neural network and boosted tree. Appl. Sci. 2017, 7, 1000. https://doi.org/10.3390/app7101000. [CrossRef]
  34. Liu, Y.; Zhao, L.; Bao, A.; Li, J.; Yan, X. Chinese high resolution satellite data and GIS-based assessment of landslide susceptibility along highway G30 in Guozigou Valley using logistic regression and MaxEnt model. Remote Sens. 2022, 14, 3620. https://doi.org/10.3390/rs14153620. [CrossRef]
  35. Fernández, T.; Pérez, J.; Cardenal, J.; Gómez, J.; Colomo, C.; Delgado, J. Analysis of landslide evolution affecting olive groves using UAV and photogrammetric techniques. Remote Sens. 2016, 8, 837. https://doi.org/10.3390/rs8100837. [CrossRef]
  36. Lindner, G.; Schraml, K.; Mansberger, R.; Hübl, J. UAV monitoring and documentation of a large landslide. Appl. Geomat. 2016, 8, 1–11. https://doi.org/10.1007/s12518-015-0165-0. [CrossRef]
  37. Rossi, G.; Tanteri, L.; Tofani, V.; Vannocci, P.; Moretti, S.; Casagli, N. Multitemporal UAV surveys for landslide mapping and characterization. Landslides. 2018, 15, 1045–1052. https://doi.org/10.1007/s10346-018-0978-0. [CrossRef]
  38. Van Dao, D.V.; Jaafari, A.; Bayat, M.; Mafi-Gholami, D.; Qi, C.; Moayedi, H.; Phong, T.V.; Ly, H.; Le, T.; Trinh, P.T.; et al. A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA. 2020, 188, 104451. https://doi.org/10.1016/j.catena.2019.104451. [CrossRef]
  39. Goyes-Peñafiel, P.; Hernandez-Rojas, A. Landslide susceptibility index based on the integration of logistic regression and weights of evidence: A case study in Popayan, Colombia. Eng. Geol. 2021, 280, 105958. https://doi.org/10.1016/j.enggeo.2020.105958. [CrossRef]
  40. Kumar, C.; Walton, G.; Santi, P.; Luza, C. An ensemble approach of feature selection and machine learning models for regional landslide susceptibility mapping in the arid mountainous terrain of Southern Peru. Remote Sens. 2023, 15, 1376. https://doi.org/10.3390/rs15051376. [CrossRef]
  41. Neugirg, F.; Kaiser, A.; Schmidt, J.; Becht, M.; Haas, F. Quantification, analysis and modelling of soil erosion on steep slopes using LiDAR and UAV photographs. Proc. IAHS. 2015, 367, 51–58. https://doi.org/10.5194/piahs-367-51-2015. [CrossRef]
  42. Gao, J.; Shi, X.; Li, L.; Zhou, Z.; Wang, J. Assessment of landslide susceptibility using different machine learning methods in Longnan City, China. Sustainability. 2022, 14, 16716. https://doi.org/10.3390/su142416716. [CrossRef]
  43. Zhang, J.; Ma, X.; Zhang, J.; Sun, D.; Zhou, X.; Mi, C.; Wen, H. Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model. J. Environ. Manage. 2023, 332, 117357. https://doi.org/10.1016/j.jenvman.2023.117357. [CrossRef]
  44. Zhou, J.; Tan, S.; Li, J.; Xu, J.; Wang, C.; Ye, H. Landslide susceptibility assessment using the analytic hierarchy process (AHP): A case study of a construction site for photovoltaic power generation in Yunxian County, Southwest China. Sustainability. 2023, 15, 5281. https://doi.org/10.3390/su15065281. [CrossRef]
  45. Gorsevski, P.V. An evolutionary approach for spatial prediction of landslide susceptibility using LiDAR and symbolic classification with genetic programming. Nat. Hazards. 2021, 108, 2283–2307. https://doi.org/10.1007/s11069-021-04780-z. [CrossRef]
  46. Lidberg, W.; Nilsson, M.; Ågren, A. Using machine learning to generate high-resolution wet area maps for planning forest management: A study in a boreal forest landscape. Ambio. 2020, 49(2), 475-486. https://doi.org/10.1007/s13280-019-01196-9. [CrossRef]
  47. Taalab, K.; Cheng, T.; Zhang, Y. Mapping landslide susceptibility and types using Random Forest. Big Earth Data. 2018, 2, 159–178. https://doi.org/10.1080/20964471.2018.1472392. [CrossRef]
  48. Park, S.; Hamm, S.-Y.; Kim, J. Performance evaluation of the GIS-based data-mining techniques decision tree, random forest, and rotation forest for landslide susceptibility modeling. Sustainability. 2019, 11, 5659. https://doi.org/10.3390/su11205659. [CrossRef]
  49. Sun, D.; Xu, J.; Wen, H.; Wang, D. Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng. Geol. 2021, 281, 105972. https://doi.org/10.1016/j.enggeo.2020.105972. [CrossRef]
  50. Saha, S.; Saha, A.; Hembram, T.K.; Pradhan, B.; Alamri, A.M. Evaluating the performance of individual and novel ensemble of machine learning and statistical models for landslide susceptibility assessment at Rudraprayag District of Garhwal Himalaya. Appl. Sci. 2020, 10, 3772. https://doi.org/10.3390/app10113772. [CrossRef]
  51. Huang, F.; Ye, Z.; Jiang, S.; Huang, J.; Chang, Z.; Chen, J. Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. CATENA. 2021, 202, 105250. https://doi.org/10.1016/j.catena.2021.105250. [CrossRef]
  52. Hussain, M.A.; Chen, Z.; Wang, R.; Shoaib, M. PS-InSAR-based validated landslide susceptibility mapping along Karakorum Highway, Pakistan. Remote Sens. 2021, 13, 4129. https://doi.org/10.3390/rs13204129. [CrossRef]
Figure 1. An overall flow of the erosion susceptible (ES) model.
Figure 1. An overall flow of the erosion susceptible (ES) model.
Preprints 86190 g001
Figure 2. Location and panoramic view of timber harvesting area in experiment forest of Kangwon National University.
Figure 2. Location and panoramic view of timber harvesting area in experiment forest of Kangwon National University.
Preprints 86190 g002
Figure 3. DSM of June 10 and July 9 from UAV photogrammetry method.; (a) DSM derived from the UAV survey on Jun 10; (b) DSM derived from the UAV survey on July 9th.
Figure 3. DSM of June 10 and July 9 from UAV photogrammetry method.; (a) DSM derived from the UAV survey on Jun 10; (b) DSM derived from the UAV survey on July 9th.
Preprints 86190 g003
Figure 4. DSM of difference (DoD) calculated from the subtraction of DSM acquired in July and June.
Figure 4. DSM of difference (DoD) calculated from the subtraction of DSM acquired in July and June.
Preprints 86190 g004
Figure 5. Terrain independent variables in maps for stacking processes and ES analysis; (a) terrain topographic index (TWI), (b) slope, (c) terrain roughness index (TRI), (d) profile curvature (PRC), e) plan curvature (PLC), and (f) aspect.
Figure 5. Terrain independent variables in maps for stacking processes and ES analysis; (a) terrain topographic index (TWI), (b) slope, (c) terrain roughness index (TRI), (d) profile curvature (PRC), e) plan curvature (PLC), and (f) aspect.
Preprints 86190 g005
Figure 6. AUC of stratified cross-validation (SCV) results from the XGB model with 10% sample dataset of the total dataset; (a) 5 SCV split set, and (b) 10 SCV split set.
Figure 6. AUC of stratified cross-validation (SCV) results from the XGB model with 10% sample dataset of the total dataset; (a) 5 SCV split set, and (b) 10 SCV split set.
Preprints 86190 g006
Figure 7. Stratified cross-validation (SCV) results of LR and RF model at 5-split SCV set from 10% of the total dataset; (a) AUC-LR from the 5-split SCV set; (b) AUC-RF from 5-split SCV set.
Figure 7. Stratified cross-validation (SCV) results of LR and RF model at 5-split SCV set from 10% of the total dataset; (a) AUC-LR from the 5-split SCV set; (b) AUC-RF from 5-split SCV set.
Preprints 86190 g007
Figure 9. Identification of ES at wheel tracks and logging tracks in the timber harvesting area from the XGB map.
Figure 9. Identification of ES at wheel tracks and logging tracks in the timber harvesting area from the XGB map.
Preprints 86190 g009
Table 1. Parameters used in each process to generate 3D data.
Table 1. Parameters used in each process to generate 3D data.
Process Parameter Setting
Align Photos 2D images input 140 m +100 m
Accuracy Highest
Reference preselection On
Key point limit 40,000
Tie point limit 4,000
Build Dense Cloud Quality Ultra-high
Depth filtering Aggressive
Build DSM Projection WGS 84 (EPSG: 4326)
Source data Dense cloud
Point classes All
Build Orthomosaic Projection WGS 84 (EPSG: 4326)
Surface DEM
Table 3. Confusion matrix of classification results analyzed by erosion susceptibility models.
Table 3. Confusion matrix of classification results analyzed by erosion susceptibility models.
Predicted
X'1 (None-Erosion Susceptible) X'0 (Erosion Susceptible)
Observed X'1 (none-erosion occurred) True positive False negative
X'0 (erosion occurred) False positive True negative
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated