Subject:
Computer Science And Mathematics,
Artificial Intelligence And Machine Learning
Keywords:
machine learning; accuracy; complexity; entropy; landslide susceptibility mapping; dimensionality reduction; Principal Component Analysis (PCA)
Online: 10 July 2023 (11:11:27 CEST)
In this study, our primary objective was to analyze the tradeoff between accuracy and complexity in machine learning models, with a specific focus on the impact of reducing complexity and entropy on the production of landslide susceptibility maps. We aimed to investigate how simplifying the model and reducing entropy can affect the capture of complex patterns in the susceptibility maps. To achieve this, we conducted a comprehensive evaluation of various machine-learning algorithms for classification tasks. We compared the performance of these algorithms in terms of accuracy and complexity, considering both "before" and "after" scenarios of dimensionality reduction using Principal Component Analysis (PCA).
Our findings revealed that reducing complexity and lowering entropy can lead to an increase in model accuracy. However, we also observed that this reduction in complexity comes at the cost of losing important complex patterns in the produced landslide susceptibility maps. By simplifying the model and reducing entropy, certain intricate relationships and uncertain patterns may be overlooked, resulting in a loss of information and potentially compromising the accuracy of the susceptibility maps. The analysis encompassed a diverse range of machine learning algorithms, including Random Forest (RF), Extra Trees (EXT), XGboost, LightGBM, Catboost, Naive Bayes (NB), K-Nearest Neighbors (KNN), Gradient Boosting Machine (GBM), and Decision Trees (DT). Each algorithm was evaluated for its strengths and limitations, considering the tradeoff between accuracy and complexity.
Before dimensionality reduction, the algorithms demonstrated promising results, with RF exhibiting excellent AUC/ROC scores and average accuracy. However, computational costs were noted as a potential drawback for RF, especially when dealing with large datasets. EXT showcased robust performance and good accuracy, while XGboost demonstrated its ability to handle complex relationships within large datasets, albeit requiring careful hyperparameter tuning. The efficiency and scalability of LightGBM made it a suitable choice for large datasets, although it displayed sensitivity to class imbalance. Catboost excelled in handling categorical features, but longer training times were observed for larger datasets. NB showcased simplicity and computational efficiency but assumed independence among features. KNN, known for its capability to capture local patterns and spatial relationships, was found to be sensitive to the choice of distance metric. GBM, while capturing complex relationships effectively, was prone to overfitting without proper regularization. DT, with its interpretability and ease of understanding, faced limitations in terms of overfitting and limited generalization. After dimensionality reduction, certain algorithms exhibited improvements in their AUC/ROC scores and average accuracy, including RF, EXT, XGboost, and LightGBM. However, for a few algorithms, such as NB and DT, a decrease in performance was observed. This study provides valuable insights into the performance characteristics, strengths, and limitations of various machine learning algorithms in classification tasks. Researchers and practitioners can utilize these findings to make informed decisions when selecting algorithms for their specific datasets and requirements. We also aim to identify the potential factors contributing to the high accuracy rates obtained from these ensembled algorithms and explore possible shortcomings of non-ensembled algorithms that may result in lower accuracy rates. By conducting a comprehensive analysis of these algorithms, we seek to provide valuable insights into the benefits and limitations of ensembled approaches for landslide susceptibility mapping.
Our study sheds light on the challenges faced when balancing accuracy and complexity in machine learning models for landslide susceptibility mapping. It emphasizes the importance of carefully considering the level of complexity and entropy reduction in relation to the specific patterns and uncertainties present in the data. By providing insights into this tradeoff, our research aims to assist researchers and practitioners in making informed decisions regarding model complexity and entropy reduction, ultimately improving the quality and interpretability of landslide susceptibility maps.
Subject:
Computer Science And Mathematics,
Artificial Intelligence And Machine Learning
Keywords:
artificial neural networks; Bayesian techniques; metaheuristic techniques; hyperparameters; feature selection techniques
Online: 26 July 2023 (03:37:57 CEST)
The most frequent, noticeable, and frequent natural calamity in the karakoram region is landslides. Extreme landslides have occurred frequently along Karakoram highway, particularly during the monsoon, causing a major loss of life and property. Therefore, it was necessary to look for a solution to increase growth and vigilance in order to lessen losses related to landslides caused by natural disasters. By utilizing contemporary technologies, an early warning system might be developed. Artificial neural networks (ANNs) are widely used nowadays across many industries. This paper's major goal is to provide new integrative models for assessing landslide susceptibility in a prone area of north of Pakistan. To do this, the training of an artificial neural network (ANN) is supervised using metaheuristic and Bayesian techniques: particle swarm optimization algorithm (PSO), Genetic algorithm (GA), Bayesian optimization Gaussian process (BO_GP), and Bayesian optimization Gaussian process (BO_TPE). 304 previous landslides and the eight most prevalent conditioning elements combine to form a geographical database. The models are hyper-parameter optimized, and the best ones are employed to generate the susceptibility maps. The area under the receiving operating characteristic curve (AUROC) accuracy index found demonstrated that the maps produced by both Bayesian and metaheuristic algorithms are highly accurate. The effectiveness and efficiency of applying artificial neural networks (ANNs) for landslide mapping, susceptibility analysis, and forecasting are studied in this research it’s observed from experimentation that the performance differences for GA, BO_GP, and PSO compared to BO_TPE are relatively small, ranging from 0.3166% to 1.8399%. This suggests that these techniques achieved comparable performance to BO_TPE in terms of AUC. However, it's important to note that the significance of these differences can vary depending on the specific context and requirements of the ML task. Additionally in this study, we explore eight feature selection algorithms to determine the geospatial variable importance for landslide susceptibility mapping along the KKH. The algorithms considered include Information Gain, Gain Ratio, OneR Classifier, Subset Evaluators, Principal Components, Relief Attribute Evaluator, Correlation, and Symmetrical Uncertainty. These algorithms enable us to evaluate the relevance and significance of different geospatial variables in predicting landslide susceptibility. By applying these feature selection algorithms, we aim to identify the most influential geospatial variables that contribute to landslide occurrences along the KKH. The algorithms encompass a diverse range of techniques, such as measuring entropy reduction, accounting for attribute bias, generating single rules, evaluating feature subsets, reducing dimensionality, and assessing correlation and information sharing. The findings of this study will provide valuable insights into the critical geospatial variables associated with landslide susceptibility along the KKH. These insights can aid in the development of effective landslide mitigation strategies, infrastructure planning, and targeted hazard management efforts. Additionally, the study contributes to the field of geospatial analysis by showcasing the applicability and effectiveness of various feature selection algorithms in the context of landslide susceptibility mapping.