1. Introduction
Multiple fields of application, such as visual computing, language comprehension, suggestion engines, consumer activity analysis, and marketing have widely applied machine learning (ML) algorithms on a massive scale [
1]. This is owing to the reality that they are versatile and proficient at solving data diagnosing issues. Different ML algorithms are appropriate for diverse varieties of datasets and issues [
2]. Overall, developing competent ML models necessitates efficient fine-tuning of hyper-parameters based on the specifications of the chosen model [
3].
Several alternatives must be examined to design and implement the most efficient ML model. Hyper-parameter optimization is the method of crafting an ideal model architecture using the optimal hyper-parameter configuration. The process of refining hyper-parameters is deemed crucial in generating a thriving machine learning model, specifically for deep neural networks and tree-based ML models, which contains an abundance of hyper-parameters. The hyper-parameter optimization process differs across ML algorithms due to the varied kinds of hyperparameters they employ, such as discrete, categorical, and continuous hyperparameters [
4]. Non automatic traditional manual testing approach for hyper-parameter tuning is still widely used by advance degree research students, despite the requirement for a thorough comprehension of ML algorithms and the importance of their hyperparameter configurations [
5]. Nevertheless, because of various factors, including complex models, numerous hyper-parameters, lengthy assessments, and non-linear hyper-parameter relationships, manual tuning is not effective for several reasons. These factors have spurred additional research on techniques for automatic hyper-parameter optimization, known as "hyper-parameter optimization" (HPO) [
6].
The principal objective of Hyper-Parameter Optimization (HPO) is to streamline the hyper-parameter tuning system and empower users to effectively implement machine learning models to address real-world problems [
3]. Upon completion of an HPO procedure, one expects to obtain the optimal architecture for an ML model. Below are some noteworthy justifications for utilizing HPO techniques with ML models:
Like numerous ML programmers devote significant time to adjusting the hyperparameters, notably for huge datasets or intricate ML algorithms having numerous hyperparameters, it decreases the degree of human labor required.
It boosts the efficacy of ML models. Numerous ML hyperparameters have diverse optimal values to attain the best results on different datasets or problems.
It boosts the replicability of the frameworks and techniques. Several ML algorithms may solely be justly assessed when the identical degree of hyper-parameter adjustment is applied; consequently, utilizing the equivalent HPO approach to several ML algorithms also assists in recognizing the ideal ML model for a specific problem.
To identify the most appropriate hyper-parameters, selecting the appropriate optimization technique is necessary. As a considerable number of HPO problems are complex nonlinear optimization challenges, they might not lead to a global optimum but rather to a local one. Therefore, standard optimization methods possibly inappropriate for HPO issues [
7]. For continuous hyperparameters, the gradients can be computed by means of gradient descent-based techniques, which are a typical variant of conventional optimization algorithms [
8]. As an example, a gradient-based method may be employed to enhance the learning rate in a neural network.
Numerous other enhancement methods, like decision-theoretic techniques, Multi fidelity optimization methods and Bayesian optimization models, and metaheuristic algorithms, are better suited for HPO challenges in contrast to traditional optimization techniques like gradient descent [
4]. Several of these algorithms can precisely determine conditional, categorical, and discrete hyper-parameters as well as continuous hyper-parameters.
The methods based on decision theory are founded on the idea of constructing a search space for hyperparameters, identifying the hyperparameter combinations within the search space, and choosing the combination of hyperparameters with the highest performance. A decision-theoretic strategy called grid search (GS) [
9] involves scanning through a predetermined range of hyperparameter values. Random search (RS) [
10], another decision-theoretic approach, is used when execution time and resources are limited, and it randomly selects hyperparameter combinations from the search space. In GS and RS, each hyperparameter configuration is verified individually.
Bayesian optimization (BO) [
11] models, in contrast to GS and RS, deduce the subsequent hyper-parameter value derived from the outcomes of the tried hyper-parameter values, avoiding several unnecessary assessments. Consequently, BO can recognize the optimal hyper-parameter fusion with lesser rounds of testing than GS and RS. BO can employ multiple models like the tree-structured Parzen estimators (TPE), the random forest (RF), and the Gaussian process (GP) [
12]. As a surrogate function to model the distribution of the objective function for various scenarios. BO-RF and BO-TPE [
12] can preserve the dependency of factors. Conditional hyper-parameters, such as the kernel type and A support vector machine’s (SVM) punishment parameter C, can be optimized using them. Parallelizing BO models is demanding because they function sequentially to strike a balance between discovering unexplored areas and exploiting regions that have already been tested.
Training an ML model often demands extensive labor and resources. To address resource constraints, multi-fidelity optimization algorithms, particularly those based on bandits, are widely used. A prevalent bandit-based optimization method called Hyperband [
13] is an advanced version of RS. It produces downsized datasets and assigns an equal budget to every cluster of hyper-parameters. To save time and resources, Hyperband discards inferior hyper-parameter configurations in each cycle.
HPO problems are classified as intricate, non-linear, and extensive search space optimization problems, which are tackled utilizing metaheuristic algorithms [
14]. The two most commonly employed metaheuristic algorithms for HPO are the Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) [
15,
16]. In each iteration, genetic algorithms determine the most optimal hyper-parameter fusion and transmit those combinations to the ensuing iteration. In each cycle, every particle in PSO algorithms interacts with other elements to identify and revise the present global peak until it reaches the ultimate peak. Metaheuristics can efficiently explore the area and discover optimal or almost optimal solutions. Because of their superior efficiency, they are highly appropriate for HPO problems with extensive arrangement spaces.
Despite the fact that HPO algorithms are immensely useful in refining the effectiveness of ML models by adjusting the hyper-parameters, other factors, such as their computational intricacy, still have a lot of room for progress. However, as different HPO models have distinct advantages and limitations that make them suitable for addressing specific ML model types and issues, it is vital to take them all into account when selecting an optimization algorithm. This academic article provides the subsequent contributions:
It encompasses three well-known machine learning algorithms (SVM, RF and KNN) and their fundamental hyper-parameters.
It assesses conventional HPO methodologies, comprising their pros and cons, to facilitate their application to different ML models by selecting the fitting algorithm in pragmatic circumstances.
It investigates the impact of HPO techniques on the comprehensive precision of landslide susceptibility mapping.
It contrasts the increase in precision from the starting point and predetermined parameters to fine-tuned parameters and their impact on three renowned machine learning methods.
This overview article provides a comprehensive analysis of optimization approaches used for ML hyper-parameter adjustment issues. We specifically focus on the application of multiple optimization approaches to enhance model accuracy for landslide susceptibility mapping. Our discussion encompasses the essential hyper-parameters of well-known ML models that require optimization, and we delve into the fundamental principles of mathematical optimization and hyper-parameter optimization. Furthermore, we examine various advanced optimization techniques proposed for addressing HPO problems. Through evaluation, we assess the effectiveness of different HPO techniques and their suitability for ML algorithms such as SVM, KNN, and RF.
To demonstrate the practical implications, we present the outcomes of applying various HPO techniques to three machine learning algorithms. We thoroughly analyze these results and also provide experimental findings from the application of HPO on landslide dataset. This allows us to compare different HPO methods and explore their efficacy in realistic scenarios. In conclusion, this overview article provides valuable insights into the optimization of hyper-parameters in machine learning, offering guidance for researchers and practitioners in selecting appropriate optimization techniques and effectively applying them to enhance the performance of ML models in various applications.
Study Area
A region of roughly 332 km of the KKH expressway was analyzed. Conversely, the entire expanse of the route amounts to 1300 km, joining different provinces of Pakistan, like Punjab, Khyber Pakhtunkhwa, and Gilgit Baltistan with Xinjiang, an independent territory of China. The analysis was conducted in the north of Pakistan in the Gilgit, Hunza, and Nagar districts. There are various settlements along the KKH from Juglot, situated between 36°12′147″N latitude and 74°18′772″E longitude, moving through Jutal, Rahimbad, Aliabad, and culminating at Khunjarab Top, the China–Pakistan border crossing. The locality is positioned along the Indus River, Hunza River, and Gilgit River. The evaluated zone measures 332 km in length and 10 km in radius, covering 3320 km2 along the KKH. The majority of the area is hilly, with the highest peak reaching 5370 m and the lowest elevation being 1210 m. Snowslides, mudslides, and tremors are frequent natural hazards in this region. A rockslide or rubble fall set off by precipitation or seismic movements is the most prevalent type of landslide in our evaluation domain (
Figure 1).
Landslide Conditioning Factors
Eight different landslide conditioning factors used in our case study are presented in
Figure 2 and further details of these variables are mentioned in
Table 1 below.
2. Methodology
As a starting point, the landslide dataset along KKH, which is a pure classification problem, serves as the gauge dataset for the examination of the HPO method on data analysis issue.
The subsequent step involves configuring the ML models with their objective function. Based on the characteristics of their hyper-parameters, all popular ML models are categorized into five groups explain in
Section 3. The three most common examples of these ML categories are "one categorical hyper-parameter," "a few conditional hyper-parameters," and "a wide hyper-parameter configuration space with multiple categories of hyper-parameters. RF, KNN, and SVM are chosen as the three ML algorithms to be adjusted since their hyper-parameter types correspond to the three typical HPO scenarios: each sample’s closest neighbors in terms of KNN is a crucial hyper-parameter; the penalty parameter and the kernel type C are a few conditional hyper-parameters in SVM; As described in
Section 6, RF has a number of many kinds of hyper-parameters. Additionally, KNN, SVM, and RF can solve classification problems.
The evaluation metric and evaluation technique are determined in the subsequent step. The HPO methods employed in our experiment on the chosen dataset are evaluated using 3-fold cross-validation. In our experiments, the two most common performance measurements are utilized. The accuracy, which is the ratio of precisely labeled data, is used as the classifier performance parameter for classification models, and the model efficiency is also calculated using the computational time (CT), which is the overall time required to complete an HPO procedure with threefold cross-validation.
Subsequently, a number of criteria must be met to accurately compare various optimization methods and frameworks. In order to compare different HPO techniques, we first utilize the same hyper-parameter configuration space. For each evaluation of an optimization approach, the single hyper-parameter for KNN, 'n neighbors,' is set to be in the similar span of 1 to 20. For each type of problem, the hyper-parameters for SVM and RF models for classification problems are also set to be in the same configuration space.
Table 3 displays the characteristics of the setup space for ML models.
Drawing from the notions presented in
Section 3 and manual experimentation, the selected hyper-parameters and their exploration domain are identified [120].
Table 2 likewise details the hyper-parameter categories for each ML technique.
Section 4 introduces six different hyperparameter optimization (HPO) approaches. To evaluate their performance, we chose six representative HPO methods discussed in
Section 4, namely Grid Search (GS), Genetic Algorithm (GA), the Random Search (RS), the Bayesian Optimization with the Gaussian Process (BO-GP), and the Bayesian Optimization with the Tree-structured Parzen Estimator (BO-TPE), and Particle Swarm Optimization (PSO). To ensure unbiased empirical conditions for each HPO approach, the HPO experiments were carried out based on the procedures outlined in
Section 2. Python 3.5 was used for all experiments, which were carried out on a system with a Core i7 processor and 32 GB of RAM. To investigate the associated machine learning and HPO methods, a variety of open-source Python modules and frameworks were used, encompassing sklearn [30], Skopt [110], Hyperopt [106], Optunity [79], Hyperband [16], BOHB [93], and TPOT [118].
3. Hyper-Parameters
Hyperparameter configuration characteristics can be used to categorize ML algorithms. Based on these features, suitable optimization methods can be selected to optimize the hyper-parameters.
3.1. Discrete Hyper-Parameter
Discrete hyperparameter typically needs to be modified for some ML algorithms, such as specific neighbor-based, clustering, and dimensionality reduction algorithms. The primary hyper-parameter for KNN is the number of considered neighbors, or k. The number of clusters is the most important hyper-parameter for k-means, hierarchical clustering, and EM. Similar to this, the fundamental hyper-parameter for dimensionality reduction techniques like PCA and LDA is "n components," or the quantity of features to be retrieved. The best option under these circumstances is Bayesian optimization, and the three surrogates might be evaluated to see which is most effective. Another excellent option is hyperband, which may have a quick execution time because of its parallelization capabilities. In some circumstances, users may want to fine-tune the ML model by taking into account other less significant hyper-parameters, such as the distance metric of KNN and the SVD solver type of PCA; in these circumstances, BO-TPE, GA, or PSO could be used.
3.2. Continuous Hyper-Parameter
several nave Bayes algorithms, such as multinomial NB, Bernoulli NB, and complement NB, as well as several ridge and lasso methods for linear models typically only have one crucial continuous hyper-parameter that needs to be set. The continuous hyper-parameter for the ridge and lasso algorithms is "alpha," or the regularization strength. The key hyper-parameter, commonly known as "alpha," in the three NB algorithms stated above really refers to the additive (Laplace/Lidstone) smoothing value. The best option among these ML algorithms is BO-GP since it excels at optimizing a constrained set of continuous hyper-parameters. Although gradient-based algorithms are also possible, they may only be able to find local optimum locations, making them less efficient than BO-GP.
3.3. Conditional Hyper-Parameters
It is apparent that many ML algorithms, including SVM, LR, and DBSCAN, have conditional hyper-parameters. 'penalty', 'C', and the solver type are the three correlated hyper-parameters of LR. Similar to DBSCAN, 'eps' and 'min samples' need to be tweaked together. SVM is more complicated because, after choosing a new kernel type, a unique set of conditional hyper-parameters must be calibrated. As a result, some HPO techniques, such as GS, RS, BO-GP, and Hyperband, which cannot successfully optimize conditional hyper-parameters, are not appropriate for ML models with conditional hyper-parameters. If the correlations between the hyper-parameters are known in advance, BO-TPE is the ideal option for these ML approaches. SMAC is an additional option that works well for fine-tuning conditional hyper-parameters. You can also utilize GA and PSO.
3.4. Categorical Hyper-Parameters
Given that their primary hyper-parameter is a categorical hyper-parameter, ensemble learning algorithms tend to use this category of hyper-parameters. The categorical hyper-parameter for bagging and AdaBoost is "base estimator," which is configured to be a single ML model. 'Estimators' is the term used for voting and denotes a list of ML single models that will be integrated. 'Voting' is a further categorical hyper-parameter of the voting method that is used to select between a hard and soft voting approach. To evaluate whether these categorical hyper-parameters are a viable base for machine learning, GS would be adequate. However, other hyper-parameters, such as 'n estimators', 'max samples', and 'max features' in bagging, as well as 'n estimators' and 'learning rate' in AdaBoost, frequently need to be taken into account; as a result, BO algorithms would be a better option to optimize these continuous or discrete hyper-parameters. In conclusion, the most appropriate HPO method should be chosen based on the characteristics of its hyper-parameters when adjusting an ML model to obtain high model performance and low computing costs.
3.5. Big Hyper-Parameter Configuration Space with Different Types of Hyper-Parameters
Since they have numerous hyper-parameters of diverse, different types, tree-based algorithms in ML, such as DT, RF, ET, and XGBoost, as well as DL algorithms, such as DNN, CNN, and RNN, are the most difficult to fine-tune. PSO is the ideal option for these ML models since it allows for parallel executions to increase efficiency, especially for DL models that frequently require a significant amount of training time. Other techniques like GA, BO-TPE, and SMAC can also be utilized, however they might take longer than PSO to complete because it is challenging to parallelize these approaches.
5. Mathematical and Hyper-Parameter Optimization
Machine learning is primarily used to address issues with efficiency. To accomplish this, a weight parameter improvement technique for an ML model is used until the objective function value reaches a minimum value and the accuracy rate reaches a maximum value. Similar to this, methods for optimizing hyperparameter configurations aim to improve a machine learning model's architecture. The fundamental ideas of mathematical optimization are covered in this part, along with hyperparameter optimization for machine learning models.
5.1. Mathematical Optimization
The aim of mathematical optimization is to locate the optimal solution from a pool of possibilities that maximized or minimized objective function [
35]. Depending on whether restrictions are placed on the choice or the solution variables, optimization problems can be classified as either constrained or unconstrained. A decision variable x in unconstrained optimization problems can take on any value from the one-dimensional space of real numbers, R. This problem is an unconstrained optimization problem [
36].
where the goal function is f(x).
In contrast, constrained optimization problems are more prevalent in real-world optimization problems. The decision variable x in constrained optimization problems, must satisfy specific constraints, which can be equalities or inequalities in mathematics. Therefore, optimization problems can be expressed as general optimization problems or constrained optimization problems [
36].
where X is the domain of x,
are the inequality constraint functions, and
are the equality constraint functions.
Constraints serve the purpose of limiting the feasible region, or the possible values of the optimal answer, to specific regions of the search space.
As a result, the feasible area D of x can be illustrated as follows:
An objective function f(x) that can be minimized or maximized, a collection of decision variables x, and an optimization problem are the three main components. The variables may be allowed to take on values within a certain range by a set of constraints that apply to the issue. if the optimization issue is constrained. Determining the collection of variable values that minimizes or maximizes the objective function while satisfying any necessary constraints is the aim of optimization problems.
The viable range of the cluster count in k-means, as well as temporal and spatial limitations, are typical constraints in HPO problems. Consequently, constrained optimization methods are frequently employed in HPO problems.
In many situations, optimization problems may converge to a local optima rather than a global optimum. For example, when seeking the minimum value of a problem, suppose that D is the viable region of a decision factor x. A global minimum is the point
satisfy
, whereas a local minimum is the point
in a vicinity N satisfy
[
36]. As a result, the local optimum only exists in a limited range and might not be the best option for the full possible region.
Only convex functions have the guarantee that a local optimum is also the global optimum [
37]. Convex functions are those that have a single optimum. Consequently, the global optimal value can be found by extending the search along the direction in which the objective function declines.
f(x) is a convex function if and only if [
37], for
where t is a coefficient with a range of [0,
1] and X is the domain of the choice variables. Only when the viable region C is a convex set and the objective function f(x) is a convex function is an optimization issue a convex optimization problem [
37].
Subject to C.
Conversely, nonconvex functions only have one global optimum while having several local optimums. Nonconvex optimization problems make up the majority of ML and HPO issues. Inappropriate optimization techniques frequently only find local rather than global optimums.
Traditional techniques such as Newton's method, conjugate gradient, gradient descent, and heuristic optimization techniques can all be utilized to address optimization problems [
35].Gradient descent is a popular optimization technique that moves in the opposite direction of the positive gradient as the search trajectory as it approaches the optima. The global optimum, however, cannot be detected with certainty via gradient descent unless the objective function is convex. The Hessian matrix's inverse matrix is used by Newton's technique to determine the optimal solution. Despite needing more time and space to store and construct the Hessian matrix than gradient descent, Newton's approach offers a faster convergence speed.
To find the best solution, conjugate gradient searches are conducted across conjugated direction created by the gradient of known data samples. Conjugate gradient has a higher rate of convergence than gradient descent, but its computation is more difficult. Heuristic methods, in contrast to other conventional approaches, solve optimization issues by applying empirical rules rather than by following a set of predetermined processes to arrive at the solution. Heuristic techniques frequently find the estimated global optima after a few rounds, although they can’t always find the global optimum [
35].
5.2. Hyper-Parameter Optimization
Throughout the ML model design phase, the optimal hyper-parameters can be identified by efficiently exploring the hyperparameter space using optimization techniques. The hyper-parameter optimization procedure comprises four key constituent: an estimator also known as a regressor or classifier with a goal function, a search space or configuration space, an optimization or search method to find combinations of hyper-parameters, and an evaluation function to gauge how well different hyper-parameter configurations work.
Hyper-parameters, like whether to employ early halting or the learning rate, can have categorical, binary, discrete, continuous, or mixed domains. Thus, categorical, continuous, and discrete hyper-parameters are the three categories of hyper-parameters. The domains of continuous and discrete hyper-parameters are often restricted in real-world applications. Hyper-parameter configuration spaces can also include conditional hyper-parameters, which must be adjusted based on another hyper-parameter's value [
9,
38].In certain scenarios, hyperparameters have the flexibility to take on unrestricted real values, and the set of feasible hyperparameters, denoted as X, can be a vector space in n dimensions with real values. Nevertheless, in machine learning models, hyperparameters usually have specific value ranges and are subject to various constraints, which introduce complexity to their optimization problems as constrained optimization problems [
39]. For instance, in decision trees, the number of features considered should vary from 0 to the number of features, and in k-means, the number of clusters should not exceed the data points' size [
7].
Moreover, categorical attributes typically possess a restricted range of allowable values, such as the activation function and optimizer choices in a neural network. Consequently, the complexity of the optimization problem is heightened because the feasible domain of hyperparameters, denoted as X, often exhibits a complex structure[
39].
Typically, the goal of a hyper-parameter optimization task is to get [
16]:
A hyper-parameter, denoted as x, is capable of assuming any value within the search space X. The objective function, f(x), which is to be minimized, could be the error rate or the root mean squared error (RMSE), for example. The optimal hyper-parameter configuration, , is the one that results in the best value of f(x).
The objective of HPO is to fine-tune hyper-parameters within the allocated budgets to attain optimal or nearly optimal model performance. The mathematical expression of the function f varies depending on the performance metric function and the objective function of the chosen ML algorithm. Various metrics, such as F1-score, accuracy, RMSE, and false alarm rate, can be utilized to evaluate the model's performance. In practical applications, time constraints must also be considered, as they are a significant limitation for optimizing HPO models. With a considerable number of hyper-parameter configurations, optimizing the objective function of an ML model can be exceedingly time-consuming. Each time a hyper-parameter value is assessed, the entire ML model must be retrained, and the validation set must be processed to produce a score that quantifies the model's performance.
After choosing an ML algorithm, the primary HPO procedure involves the following steps [
7]:
Choose the performance measurements and the objective function.
Identify the hyper-parameters that need tuning, list their categories, and select the optimal optimization method.
Train the ML model using the default hyper-parameter setup or common values for the baseline model.
Commence the optimization process with a broad search space, selected through manual testing and/or domain expertise, as the feasible hyperparameter domain.
If required, explore additional search spaces or narrow down the search space based on the regions where best functioning hyper-parameter values have been recently evaluated.
Finally, provide the hyper-parameter configuration that exhibits the best performance.
The majority of typical optimization approaches [
40] are inappropriate for HPO However, as HPO problems differ from conventional optimization methods in the following ways [
7].
When it comes to HPO problems, conventional optimization techniques that are designed for convex or differentiable optimization problems are often not suitable due to the non-convex and non-differentiable nature of the objective function in ML models. Moreover, even some conventional derivative-free optimization methods perform poorly when the optimization target is not smooth [
41].
ML models' hyper-parameters contain continuous, discrete, categorical, and conditional hyper-parameters, which means that numerous conventional numerical optimization techniques that only deal with numerical or continuous variables are not suitable for HPO problems [
42].
In HPO approaches, computing an ML model on a large dataset can be costly, so data sampling is sometimes used to provide approximations of the objective function's values. Therefore, efficient optimization methods for HPO problems must be capable to utilize these approximations. However, many black-box optimization (BBO) methods do not consider the function evaluation time, which makes them unsuitable for HPO problems with constrained time and resource limits. To find the best hyper-parameter configurations for ML models, appropriate optimization methods must be applied to HPO problems.
7. Results
Tables 4–6 present the results of six different HPO methods applied to RF, SVM, and KNN classifiers on the landslide dataset. The default hyper-parameter configurations of each model were used as the baseline, and then HPO algorithms were applied to assess their accuracy and computational time. The results show that default settings do not always lead to the best model performance, highlighting the importance of HPO techniques.
Among the baseline models for HPO, GS and RS were used, and the results indicate that GS often has significantly higher computational time than other optimization techniques. RF and SVM models are faster than GS, but neither of them can guarantee to find near-optimal hyperparameter configurations of ML models. BO and multi-fidelity models perform significantly better than GS and RS in terms of accuracy, but BO-GP often requires longer computation times due to its cubic time complexity.
BO-TPE and BOHB frequently perform better than other methods due to their ability to quickly compute optimal or almost optimal hyper-parameter configurations. GA and PSO also frequently have higher accuracies than other HPO approaches for classification tasks. BO-TPE and PSO are often successful in finding good hyper-parameter configurations for ML models with vast configuration spaces.
Overall, GS and RS are easy to implement but may struggle to find ideal hyper-parameter configurations or take a long time to run. BO-GP and GA may take more time to compute than other HPO methods, but BO-GP performs better in small configuration spaces, while GA performs better in large configuration spaces. BO-TPE and PSO are effective for ML models with vast configuration spaces.
Performance analysis of the RF classifier using HPO methods on the landslide dataset is shown in Table 4.
Optimization Algorithm |
Accuracy (%) |
CT(s) |
GS |
0.90730 |
4.70 |
RS |
0.92663 |
3.91 |
BO-GP |
0.93266 |
16.94 |
BO-TPE |
0.94112 |
1.43 |
GA |
0.94957 |
4.90 |
PSO |
0.95923 |
3.12 |
Performance analysis of the SVM classifier using HPO methods on the landslide dataset is shown in Table 5.
Optimization Algorithm |
Accuracy (%) |
CT(s) |
BO-TPE |
0.95289 |
0.55 |
BO-GP |
0.94565 |
5.78 |
PSO |
0.90277 |
0.43 |
GA |
0.90277 |
1.18 |
RS |
0.89855 |
0.73 |
GS |
0.89794 |
1.23 |
Performance analysis of the KNN classifier using HPO methods on the landslide dataset is shown in Table 6.
Optimization Algorithm |
Accuracy (%) |
CT(s) |
BO-GP |
0.90247 |
1.21 |
BO-TPE |
0.89462 |
2.23 |
PSO |
0.89462 |
1.65 |
GA |
0.88194 |
2.43 |
RS |
0.88194 |
6.41 |
GS |
0.78925 |
7.68 |
7.1. Landslide Susceptibility Maps
7.1.1. Random Forest
The metaheuristic algorithms PSO and GA performed remarkably well, with PSO increasing accuracy from baseline optimization methods GS and RS by 5% and 3%, respectively, and GA increasing accuracy from baseline optimization techniques GS and RS by 4% and 2%. However, compared to GS and RS, the accuracy of the Bayesian optimization technique BO-TPE increased by 4% and 2%, respectively, and BO-GP by 3% and 1%. Thus, the overall accuracy of the RF model was increased via metaheuristic and Bayesian optimization as shown in the figure below. As discuss in earlier the most challenging ML algorithms to optimize are tree-based algorithms like RF because they have multiple hyper-parameters of various, different types. These ML models work best with PSO because it enables parallel executions, which boost productivity. Other methods like GA and BO-TPE can also be applied, however they might take longer to finish than PSO does because it is difficult to parallelize these techniques.
Figure 4.
Receiver-operating characteristic (ROC) curve and AUC curve of Random forest (RF) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 4.
Receiver-operating characteristic (ROC) curve and AUC curve of Random forest (RF) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 5.
landslide susceptibility maps obtain from Random Forest (RF) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).
Figure 5.
landslide susceptibility maps obtain from Random Forest (RF) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).
7.1.2. KNN
Discrete hyperparameters in KNN, like the number of neighbors to take into consideration or k, are the main hyperparameters that require tuning. As explained in section (hyperparameters), Bayesian optimization is the best choice in these conditions. As expected, the Bayesian approaches performed exceptionally well. For the KNN model, BO-TPE improved accuracy from the baseline algorithms RS and GS by 1% and 11%, respectively, while BO-GP improved results from RS and GS by 2% and 12%, respectively. The metaheuristic algorithms PSO and GA both performed similarly to BO-TPE and random search (RS), respectively.
Figure 6.
Receiver-operating characteristic (ROC) curve and AUC curve of K-nearest neighbors (KNN) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 6.
Receiver-operating characteristic (ROC) curve and AUC curve of K-nearest neighbors (KNN) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 7.
landslide susceptibility maps obtain from K-nearest neighbors (KNN) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).
Figure 7.
landslide susceptibility maps obtain from K-nearest neighbors (KNN) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).
7.1.3. SVM
Bayesian algorithms outperformed. BO-TPE and produced 6% better outcomes than the baseline algorithms GS and RS with the SVM model. whereas BO-GP increased outcomes by 5%. PSO and GA both performed similarly, with results improving by 1% as shown in figure below.
Figure 8.
Receiver-operating characteristic (ROC) curve and AUC curve of support vector machine (SVM) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 8.
Receiver-operating characteristic (ROC) curve and AUC curve of support vector machine (SVM) model with GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structured Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization) as Parameter optimization techniques.
Figure 9.
Landslide susceptibility maps obtain from Support Vector Machine (SVM) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structure Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).
Figure 9.
Landslide susceptibility maps obtain from Support Vector Machine (SVM) model using six different optimization techniques GS (Grid Search), RS (Random Search), BO-GP (Bayesian optimization Gaussian process), BO-TPE (Bayesian optimization Tree-structure Parzen estimator), GA (Genetic Algorithm) and PSO (Particle Swarm Optimization).