In this section, we provide a detailed description of the performance of the proposed approach and the process of hyperparameter tuning. Additionally, we present graphical representations comparing the model’s performance using different regression techniques. To evaluate the performance of REM construction, we employ various error metrics, including RMSE, MAE, and , and present the results in Table 2. Furthermore, we assess the accuracy of indoor localization using location error measurements.
6.1. Model Evaluation
In this subsection, we provide a detailed description of the error metrics used to evaluate REM construction and the ML-based approach to indoor localization.
A measurement that depicts the average error of the estimates is mean absolute error. To determine how well the predictions match the actual values, MAE calculates the absolute difference between the actual value, designated as
, and the corresponding predicted measure of the RSSI value, denoted as
[
35]. Then, MAE can be expressed by
where
N is the total number of data samples for equations (1), (2), (3), and (4).
The square root of the average of the squared discrepancies between predicted RSSI values
and corresponding actual observations
is used to calculate the root mean square error, a metric that measures the error rate. RMSE gives an indication of how well predictions reflect the actual numbers calculated by:
For equations (1) and (2), the performance of the system model will be considered good if the result is lower, whereas
is totally opposite; it is good when the
result is higher:
where
is the average target value.
The numerator of the second term is the mean error determined by the sum of squares of the residual prediction errors, and the denominator is the variance [
36,
37]. The fundamental goal of the
score is to quantify how much of the variation in the target-dependent variable is predictable by the independent variables in a regression model. The score has no lower range (indicating that forecasts can be severely erroneous) and an upper bound of 1, which denotes a fully accurate prediction. When the score is close to 0, it may be compared to making a random estimate about the mean
. According to the described equation, all of them will analyze a comprehensive evaluation of the system model’s performance because each statistic distinguishes different aspects of the model’s correctness and data fit.
We evaluate the performance of the algorithms for indoor localization by using the location error equation, defined by:
First, we evaluate the REM construction based on RMSE, , and MAE. In the ML algorithm, we set the following parameters: 50 for the maximum depth, 200 for the estimators, and 2 for the minimum sample split parameter. Table 2 shows the performance from REM construction, which can be evaluated based on the errors obtained for different regression techniques. We can see that the proposed ETR-based scheme achieved the lowest error among the comparative schemes, followed by random forest and the bagging regressor. However, support vector and AdaBoost regression models showed higher errors, suggesting comparatively poorer performance in this specific scenario.
We evaluated the performance of the proposed ETR-based scheme for indoor localization using the location error in Equation (
4). In addition, we compared the performance of the ETR-based scheme with multiple regression techniques as alternative approaches to indoor localization.
Table 2.
Performance comparison between the proposed ETR algorithm and other regression techniques based on REM error calculations.
Table 2.
Performance comparison between the proposed ETR algorithm and other regression techniques based on REM error calculations.
Algorithm |
RMSE |
|
MAE |
Extra Trees Regression |
0.997 |
0.975 |
0.421 |
Random Forest Regression |
1.067 |
0.971 |
0.49 |
Decision Tree Regression |
1.218 |
0.963 |
0.47 |
Bagging Regression |
1.064 |
0.972 |
0.492 |
Support Vector Regression |
2.977 |
0.779 |
2.317 |
AdaBoost Regression |
2.874 |
0.794 |
2.301 |
The proposed scheme used
K features and two labels to implement indoor localization. Then, we employed two ML-based regressors to handle the two target variables. Each ML regressor takes as features the RSSI values obtained in
K steps, denoted as
, and predicts the corresponding target value. The first ML model predicts the position of the final
K-th step in the
x-coordinate, while the second ML model predicts the position in the
y-coordinate. We evaluated the algorithm’s performance by using the location error described in Equation (
4). Multiple regression techniques have been used for selection and comparison of algorithm performance as well.
This paper presents various figures showcasing the utilization of the parameters, max depth, and number of estimators in different regression algorithms. Fine-tuning of hyperparameters was performed to achieve optimal system performance. In
Figure 8, the number of estimators was varied from 20 to 200, and location error calculations were performed by using 10-fold cross-validation with different regression techniques. This visual representation clearly demonstrates the superior performance of the ETR compared to other regression techniques. Notably, at 140 estimators, the ETR still exhibited a lower error rate with the rate remaining the lowest thereafter.
Figure 9 compares the location error versus maximum depth for the ETR, random forest, and decision tree regressors. Once again, the ETR demonstrated better performance. The error gradually decreased after a max depth of 15, reaching the lowest error rate at 40.
Simulations were conducted for the significance of the number of samples to determine the behavior of the ML-based regressor from different dataset sizes.
Figure 10 illustrates the location error versus the number of samples. An array of regressors, including the ETR and random forest, bagging, AdaBoost, support vector, and decision trees regressors, were employed to compare their performance. We can see that the system performs optimally with a sample size of 600.
In conclusion, the best parameters for our proposed ETR-based model were selected by fine-tuning the hyperparameters of ML-based regressors, and the ETR showed better performance than the compared algorithms.
Next, we evaluate localization performance in indoor environments by using a cumulative distribution function (CDF) graph (
Figure 11). The graph provides an insightful performance comparison among the various regression techniques used in the study, namely the ETR and decision tree, random forest, AdaBoost, bagging, and support vector regression. Upon analyzing the CDF graph, we can see that the proposed ETR outperformed the other regression methods in terms of localization accuracy. This implies that the ETR-based algorithm consistently provided more precise estimations of user locations. By demonstrating the superior performance of the ETR approach, the CDF graph highlights its efficacy in achieving accurate localization results, emphasizing its potential for real-world applications in various fields requiring precise location estimation. Notably, the system exhibits a remarkable level of precision, whereby approximately 90% of localization errors were found below the three-meter threshold.
6.2. Computational Complexity Analysis
The computational complexity of the proposed ETR-based approach and the random forest regressor and bagging regressor comparison systems are examined in this subsection. The results show the number of regression trees, the number of features, the number of samples, and the maximum depth of the trees affect the computational complexity of the proposed ETR. In further detail, may be used to approximate the computational complexity of the ETR, where V denotes the number of trees, K is the number of features, P is the number of training samples, and is the maximum tree depth. When choosing the optimal split in our simulations, we considered and the maximum tree depth to be .
The computational complexity of the random forest regressor is similarly determined by
[
38]. Nevertheless, compared to the random forest regressor, the ETR takes less time to compute since it employs a random threshold rather than trying to find the best practicable threshold to split the data at each node. For the bagging regressor, the computational complexity can be expressed by
[
38] where
is the total number of base regressors, and
is the computational difficulty of training a base regressor. We utilized the decision tree regressor, which has a complexity of
, as the basis estimator in our simulations.