3.3.4. Best Machine Learning Model
After running the three ML methods and determining the best CT, it was possible to further analyze when the coherence was 0.96 using the indices defined on Methodology. The results are described by the
Table 3 with a summary for all ML models with the best CT:
In addition to
Table 3, the results are also displayed on
Figure 18 to provide a visual interpretation on how each ML method performed against each other:
Starting with Accuracy, all three models performed similarly, with SVC getting the highest value (0.66), indicating that all three models (LR, RF and SVC) are comparable (0.59, 0.64 and 0.66) for correctly identifying "Events" and "No Events".
Specificity, which measures the ratio of TN amongst all negative samples (TN + FP), was the highest with SVC (0.65), followed closely by RF (0.60), which means that both RF and SVC correctly identified a bigger proportion of "No Events" (Negative Samples) compared to LR (0.27).
In terms of FPR (False-Positive Rate), the ratio of Negative Samples that were incorrectly classified as FP was the highest at LR (0.73), indicating that it tends more to incorrectly classify "No Events" (Negative Samples) compared to RF (0.40) and SVC (0.35).
Precision, the ratio of TP amongst all positive-classified samples, was obtained by SVC (0.67), followed again by RF (0.65), and LR at last (0.56), indicating that SVC and RF have a slight advantage over LR in providing a higher ratio of confidence that a Positive Sample is an "Event".
Recall, that indicates the ratio of "Events" that were correctly classified, had LR with the highest performance (0.88) compared to RF and SVC (both with 0.67), indicating that LR will correctly identify the occurrence of events 88% of the time. This is directly related to the fact that LR tended to generate more FP than FN compared to both RF and SVC. F1score, the harmonic mean between Recall and Precision, was very similar for the three models, which indicates that despite the differences, the models are comparable in terms of performance and behavior.
MCC, a coefficient to measure the quality of binary-classification problems, had SVC with the highest result (0.32), followed by RF (0.28) and LR (0.19). It can be noted that SVC and RF are performing with similar results, with SVC having an edge while LR is considerably behind.
From accuracy alone, it is not possible to find much difference between the models to determine which is best for this type of data. The OA (Overall Accuracy), which is the sum of Accuracy, F1score and MCC, is the measure to give a definitive result on the performance of the models. As was suggested by the trend in results, SVC was the best-performing model (1.65), followed closely by RF (1.58), and LR was further behind (1.47).
Overall, SVC provided the best-performing results all around, except for Recall, where LR was able to obtain a higher TP ratio, which is expected given the fact that LR had almost no FN. This advantage, however, was not enough to surpass the other methods and that’s why SVC is the best method for analyzing this data.
However, the OA (Overall Accuracy) was not as high as there were Events with no corresponding precipitation, suggesting the Occurrence of Landslides on the island is not solely caused by Precipitation. It is important to note that the island is quite seismologically active, which could also cause landslides. For this reason, other variables like seismic activity and even soil moisture could be included as input to improve the performance of the models and evaluate their relationship with the occurrence of Events.
Our study conducted on Ometepe Island, Nicaragua, presents significant insights into the relationship between precipitation patterns and shallow landslide occurrences. The application of machine learning models, namely Logistic Regression, Random Forest, and Support Vector Classifier, has facilitated a deeper understanding of this relationship. Notably, the Support Vector Classifier with a Sigmoidal kernel exhibited superior performance in correlating precipitation data with landslide events. This outcome underscores the potential of machine learning in enhancing landslide prediction and risk assessment, particularly in regions with complex geophysical characteristics like Ometepe Island.
A critical finding of our research is the correlation between precipitation and landslide events, particularly when considering a 7-day hydrometeorological period. This correlation is essential in understanding the triggering mechanisms of landslides in the region. However, our results also indicate that precipitation is not the only contributing factor to landslide occurrence, as seismic activity also plays a crucial role. This dual influence highlights the need for comprehensive risk assessment models that incorporate multiple environmental variables to accurately predict landslide occurrences.
Furthermore, our study contributes to the existing body of knowledge by emphasizing the importance of spatio-temporal aggregation of precipitation data in landslide analysis. The unique approach of combining SAR-derived coherence values with aggregated precipitation data provides a novel method for landslide prediction. This method could be particularly beneficial for regions with similar environmental and climatic conditions to Ometepe Island.
In light of our findings, future research should focus on integrating additional environmental variables, such as seismic data and soil moisture content, into landslide prediction models. This integration could enhance the accuracy and reliability of the models, thereby aiding in the development of effective landslide risk mitigation strategies. Additionally, extending this research to other regions with varying topographical and climatic conditions could provide further insights into the generalizability of our approach.
In conclusion, our study represents a significant step forward in the application of machine learning techniques for environmental hazard assessment. The methods and findings discussed here could serve as a foundation for future research aimed at improving landslide prediction and risk management strategies in vulnerable regions worldwide.