4. Discussion
In this study, six quantitative maps of MMS have been generated, utilizing the WoE method as a bivariate statistical approach and ML techniques such as LR, SVM, NB, MLP, and RF. All MMS maps exhibit the same spatial pattern, meaning that higher susceptibility is observed in the elevated areas of the study zone, which translates to values close to 1. Conversely, lower susceptibility is found in low-lying areas, where values are predominantly close to 0. In terms of surface area, approximately 33% to 40% of the study area for all six MMS models falls within the highest susceptibility levels, categorized as high and very high. Conversely, areas with lower susceptibility, classified as very low and low, cover approximately 40% to 44% of the study area. It is also noteworthy that the trend in terms of surface area covered by each susceptibility level in all models of MMS is similar. Spatially, the highest susceptibility levels in the study area are found on the eastern edge of the NLC, where the highest parts of the Andean foothills are located.
The validation process is crucial in MMS mapping. As indicated in the methodology section, two metrics were used to evaluate and validate the susceptibility models: the AUC value and the F-1 score. The overall performance of the models is evaluated using AUC analysis [49, 50]. Several studies have suggested that an AUC value between 0.8 and 0.9 indicates a very good model, while a value higher than 0.9 indicates an excellent model [8, 9]. The results revealed that all MMS applied in this study, namely LR, MLP, SVM, RF, and NB, have surpassed an AUC value of >0.9, indicating that these models demonstrate excellent performance in predicting the presence and absence of MM phenomena accurately. The highest AUC value for the training of the models was observed in the RF model, AUC=1.000, followed by the SVM model, AUC=0.994, MLP, and LR, both with AUC=0.986. Additionally, the difference between the maximum and minimum AUC values was only 1.9%. On the other hand, regarding the AUC value for the evaluation data, it was revealed that all the MMS models behave as excellent models in terms of correctly classifying the presence and absence of MM phenomena in the evaluation dataset, as they all are very close to 1. Regarding the F-1 score value, it was determined that all the MMS models surpass the F-1 score value of >0.950, with the RF model having the highest value of this metric, F-1=0.991. Finally, regarding the accuracy of the MMS models, the RF model was found to be the most accurate (Accuracy=0.989).
Figure 8.
ROC curve and AUC value for training (a) and test (b) data.
Figure 8.
ROC curve and AUC value for training (a) and test (b) data.
Similar results have been found in studies by [46, 51, 52], where the susceptibility to landslides was compared by applying different ML techniques such as NB, k-NN, RF, DNN, LR, BRT, and SVM. They found that the model with the best training and evaluation metrics is the RF model, with AUC values exceeding 0.920 in training and up to 1.000 in evaluation.
Table 7 presents the training and evaluation metrics of the MMS models generated in this study.
To compare the MMS results of WoE, LR, MLP, SVM, RF and NB with the heuristic method, the susceptibility levels were standardized into five classes. Subsequently, the susceptibility levels were extracted for the point-type vectors (PTV, centroids of the test polygons) of MM in the study area. It was determined that 69.7% of the PTV are in the high and very high MMS levels for the heuristic model. On the other hand, for RF, SVM, LR, NB, and WoE, 97.0%, 90.9%, 90.7%, 90.9%, 87.9%, and 78.8% of the PTV are at the highest susceptibility levels, namely, high and very high. The above indicates that the proposed machine learning-based models for determining MMS exhibit good performance in discriminating MM events compared to the heuristic method. This is because they are designed to automatically obtain the optimal nonlinear relationship between the study variables [17, 51, 52].
Figure 9.
Predictive capability of methods, heuristics, WoE, LR, MLP, SVM, RF and NB.
Figure 9.
Predictive capability of methods, heuristics, WoE, LR, MLP, SVM, RF and NB.
The hazard levels suggest that in the event of an El Niño phenomenon, close to 40% of the surface area of the NLC would be in the highest hazard levels, high and very high. Similarly, more than half of the surface area of the Carabayllo district (54.2%) would be under the same high and very high hazard levels, followed by the districts of Comas, Independencia, and Ancon, with high and very high hazard levels of approximately 37.0%. On the other hand, Los Olivos, Puente Piedra, and San Martin de Porras have less than 8% of their surface area at the highest hazard levels. Regarding the seismic hazard scenario, the triggering factor of seismic microzonation did not cover the entire study area spatially; it only represented 4.9%, 24.6%, 75.3%, 63.2%, 94.3%, 79.7%, and 70.4% of the surface area of Ancon, Carabayllo, Comas, Independencia, Los Olivos, Puente Piedra, and San Martin de Porras, respectively. Therefore, the percentages shown refer to the proportion of the total area covered by the seismic microzonation spatial coverage. The districts of Ancon, Carabayllo, Comas, Independencia, and Puente Piedra have between 41 and 50% of their surface area under high and very high seismic hazard levels in the event of a magnitude greater than 8Mw. The following table shows the hazard levels expressed in surface area for the El Niño phenomenon and the seismic event.
Table 8.
Hazard levels for MM under an El Niño phenomenon and earthquake greater than 8.8Mw.
Table 8.
Hazard levels for MM under an El Niño phenomenon and earthquake greater than 8.8Mw.
Distrito |
En Niño phenomenon - Hazard level (km2) |
Seismic - Hazard level (km2) |
VL |
L |
M |
H |
VH |
VL |
L |
M |
H |
VH |
Ancón |
38.711 |
73.605 |
80.316 |
47.446 |
69.538 |
0.613 |
5.095 |
3.192 |
4.127 |
2.191 |
Carabayllo |
41.491 |
50.580 |
50.618 |
81.983 |
86.703 |
12.946 |
13.747 |
11.755 |
17.185 |
20.933 |
Comas |
15.012 |
9.032 |
6.663 |
8.591 |
9.473 |
8.248 |
5.827 |
7.046 |
5.863 |
9.757 |
Independencia |
5.441 |
0.678 |
3.817 |
5.087 |
0.987 |
1.648 |
0.884 |
2.431 |
1.716 |
3.446 |
Los Olivos |
12.621 |
3.600 |
1.325 |
0.678 |
0.000 |
3.985 |
2.522 |
8.483 |
1.704 |
0.486 |
Puente Piedra |
20.155 |
14.179 |
12.097 |
3.849 |
0.026 |
4.385 |
6.866 |
11.316 |
11.307 |
6.217 |
San Martin de Porres |
26.692 |
6.196 |
2.599 |
0.477 |
0.000 |
13.712 |
7.333 |
0.000 |
3.176 |
1.086 |
Sum |
160.122 |
157.868 |
157.435 |
148.114 |
166.725 |
45.538 |
42.274 |
44.223 |
45.078 |
44.118 |
% |
20.3 |
20.0 |
19.9 |
18.7 |
21.1 |
20.6 |
19.1 |
20.0 |
20.4 |
19.9 |
In this study, MMS mapping was implemented with the purpose of identifying the areas most prone to MM as well as to evaluate the associated hazard under two scenarios: the first one considering El Niño phenomenon and the second one considering an earthquake above 8.8Mw. Susceptibility and hazard mapping are fundamental processes in disaster risk management, as they enable the identification of areas prone to risk to propose prevention and risk reduction strategies. Therefore, errors in susceptibility and hazard mapping can lead to false conclusions, resulting in loss of lives and livelihoods [
46].
As evidenced in both this study and previous research [6, 53, 54], the quantitative approach based on ML techniques offers a precise and efficient methodology for processing large and complex datasets; this includes geological, topographic, hydrological, climatic, environmental, and anthropogenic factors. In contrast, classical qualitative and semi-quantitative methods determine subjective and artificial weights based on expert judgment and experience.
It is relevant to highlight that there are variables that can introduce uncertainty in the application of ML models, such as the number and type of variables, data quality, the number of inventories of MM and non-MM for training, among others [17, 53]. In this research, the aim was not to control all variables but to maximize the available resources; therefore, the uncertainty regarding the number and type of variables was minimized by conducting a comprehensive analysis of the topographic variables in the models, including correlation, multicollinearity, and dimensional reduction using PCA. This approach allowed us to exclude variable correlations, reduce noise, and mitigate the risk of overfitting, thus improving the accuracy of the models [55, 56]. Additionally, the WoE analysis was employed to identify causal relationships between instability factors and the distribution of MM. In summary, methodologies were integrated and combined to reduce model uncertainty, resulting in hybrid models based on PCA and WoE for LR, MLP, SVM, RF and NB.
However, it is important to recognize that the success of applying ML models depends on the information provided by experts and the quality of the input data. Therefore, its implementation at a national or regional scale in other territories must be carefully evaluated, ensuring proper methodological flow and the availability of high-quality inputs, especially regarding geological, topographical, and environmental factors. Additionally, it is necessary to establish an appropriate spatial resolution of the triggering factors in relation to their spatial and temporal resolution and variability.
4.1. Limitations
Regarding the limitations of this study, it is noteworthy that there is a lack of information about the triggering events of the MM, meaning it is not specified whether they were triggered by extreme rainfall, earthquakes, anthropogenic causes, or others. Additionally, the spatial resolution of the geological, geomorphological, and hydrogeological inputs used in this study (1/100,000) may not be suitable if decisions need to be made at a detailed scale. On the other hand, the DEM used was generated at the beginning of the last decade, so there could be changes in the topography that are not considered. Additionally, it is recognized that hyperparameter optimization was not carried out as it was not the objective of the research; however, satisfactory results were obtained in the training and evaluation metrics of the models. Finally, it is noted that there is a need for further studies to improve seismic microzonation in the study area, especially due to the longitudinal growth in the periphery of Lima.
4.1. Perspectives
In terms of future perspectives, six MMS mapping were presented, five of them based on machine learning techniques with excellent results and one based on bivariate statistics with good results. All models showed better classification metrics for MM events compared to the classic heuristic method. These ML models offer a valuable tool for disaster risk management, particularly in the processes of estimation, prevention, reduction, and reconstruction of disaster risk management. The application of ML techniques, supported by available data, has the potential to significantly improve MM zoning and, ultimately, contribute to the resilience of communities against these natural events.