Abstract
The current study aims to apply and compare the performance of six machine learning algorithms, including three basic classifiers: random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGB), as well as their hybrid classifiers, using the logistic regression (LR) method (RF+LR, GBDT+LR, and XGB+LR), in order to map the landslide susceptibility of Zhangjiajie City, Hunan Province, China. First, a landslide inventory map was created with 206 historical landslide points and 412 non-landslide points, which was randomly divided into two datasets for model training (80%) and model testing (20%). Second, 15 landslide conditioning factors (i.e., altitude, slope, aspect, plane curvature, profile curvature, relief, roughness, rainfall, topographic wetness index (TWI), normalized difference vegetative index (NDVI), distance to roads, distance to rivers, land use/land cover (LULC), soil texture, and lithology) were initially selected to establish a landslide factor database. Thereafter, the multicollinearity test and information gain ratio (IGR) technique were applied to rank the importance of the factors. Subsequently, we used a series of metrics (e.g., accuracy, precision, recall, f-measure, area under the ROC (receiver operating characteristic) curve (AUC), kappa index, mean absolute error (MAE), and root mean square error (RMSE)) to evaluate the accuracy and performance of the six models. Based on the AUC values derived from the models, the GBDT+LR model with the highest AUC value (0.8168) was identified as the most efficient model for mapping landslide susceptibility, followed by the XGB+LR, XGB, RF+LR, GBDT, and RF models, which achieved AUC values of 0.8124, 0.8118, 0.8060, 0.7927, and 0.7883, respectively. The results from this study suggest that the stacking ensemble machine learning method is promising for use in landslide susceptibility mapping in the Zhangjiajie area and is capable of targeting the areas prone to landslides.