1. Introduction
The rapid development of global digitization has created a high demand for location-based services (LBS) in many industries [
1]. These services have become essential for various systems and applications, including transportation, logistics, emergency response, etc. In outdoor environments, mobile users already have access to established outdoor positioning technologies such as the Global Positioning System (GPS) [
2] and the BeiDou Satellite Navigation System (BDS) [
3] to obtain accurate location information. However, the effectiveness of these technologies is often limited in indoor environments due to the scattering and attenuation effects of satellite signals.
In the field of indoor localization, various wireless signals have been proposed and utilized, including WiFi [
4,
5,
6,
7], Bluetooth [
8,
9], Ultra-Wide Bandwidth (UWB) [
10,
11], Radio Frequency Identification (RFID) [
12], and custom radios [
13]. Typical ranging-based methods for processing wireless signals in indoor localization involve using information such as Angle of Arrival (AOA) or Time of Arrival (TOA) to estimate the specific positions of the user equipment (UE) [
14]. However, these methods require prior knowledge of the locations of access points (APs) and are susceptible to errors in the distance measurement between the UE and APs, which can negatively impact the accuracy of the positioning. In contrast to these methods, the fingerprint-based indoor localization method is characterized by simplicity and efficiency [
15]. This technique relies on the unique characteristics of wireless signals in indoor environments to create a map or "fingerprint" of the Received Signal Strength Indicator (RSSI) at different locations. The fingerprint can then be used to estimate the position of the UE based on the signal strengths measured at that location. Fingerprint-based methods are highly accurate and can offer sub-meter-level positioning accuracy in many cases, making them a promising alternative to ranging-based methods. However, in the context of fingerprint-based methods, the radio propagation environment introduces multi-path effects, shadowing, signal fading, and other forms of signal degradation and distortion leads to a significant fluctuations in RSSI values. In the experiments described in this paper, the observed RSSI values for different APs at a fixed location exhibit a wide range of fluctuations, as illustrated in
Figure 1. The fluctuation in RSSI makes it challenging to discern the pattern of RSSI between the test points (TPs) and reference points (RPs), thereby significantly impacting the accuracy of positioning.
With the development of machine learning algorithms over the past few decades, numerous machine learning algorithms have been proven to be effective in recognizing the RSSI pattern [
16]. M. Brunato et al. proposed applying Support Vector Machines (SVM) in location fingerprint positioning systems [
17]. Hoang et al. introduced a soft range-limited k-Nearest Neighbors (KNN) fingerprinting algorithm that addresses spatial ambiguity in localization by scaling the fingerprint distance with a range factor based on the physical distance between the previous position of users and the reference location in the database [
18]. Fang et al. utilized Feedforward Neural Networks (FNN) to extract fingerprint features from the RSSI, enabling accurate localization of the actual position [
19]. However, the performance of these algorithms can easily be limited when learning features in complex indoor environments. To achieve superior performance, some research studies have suggested using Long Short-Term Memory (LSTM) for handling sequential trajectory prediction in indoor localization systems [
4,
20,
21], which has been experimentally demonstrated to be more effective than the conventional KNN method. Meanwhile, self-attention has been proposed as a promising technique for enhancing the performance of sequence processing tasks [
22,
23,
24,
25]. By enabling the model to attend to various regions of the input sequence, self-attention improves its capacity to capture the connections between various features in a sequence.
This paper introduces a novel method named SA-LSTM (Self-Attention and LSTM) that effectively improves positioning accuracy and robustness. We conducted experiments in two different scenarios to validate the effectiveness and robustness of the proposed approach. The experimental results demonstrate that SA-LSTM exhibits greater robustness and higher accuracy in indoor localization compared to some of the most advanced algorithms.
The main contributions of this paper are as follows:
We propose a novel SA-LSTM model that integrates the self-attention mechanism and LSTM networks. SA-LSTM treats the localization problem as a sequence learning task. It processes the RSSI values of consecutive time instances and predicts the position at the final moment in the input sequence. The self-attention mechanism enables the LSTM to more effectively capture the interdependencies between the RSSI values at different time instances, thereby facilitating improved extraction of location information and reducing the localization error.
We validate the performance of the proposed SA-LSTM model in two distinct experimental environments. The first experiment scenario involves collecting Bluetooth RSSI data while moving in 2D trajectories on a specific floor. In the second experiment, we used an open-source WiFi RSSI dataset containing 3D-moving trajectories across various floors within a building.
We conduct a comparative analysis between our proposed model and several state-of-the-art methods. The experimental results reveal that our proposed SA-LSTM model achieves the highest localization accuracy in both experimental scenarios, demonstrating its robustness and precision.
The rest of this paper is structured as follows. Section II provides an overview of related works in the area of fingerprint indoor localization systems. Section III presents the technical details of our proposed model. Section IV outlines the experimental setup utilized in our study. Section V presents and analyzes the experimental results obtained from various datasets. Finally, Section VI offers concluding remarks and outlines our future research plans.
2. Related Work
In this section, we present an overview of the existing research on fingerprint-based indoor localization and the application of self-attention mechanisms.
The authors in [
26] proposed applying the KNN algorithm for the first time in the field of fingerprint-based indoor localization. According to the article, the RSSI values from multiple base stations were recorded and processed as reference points stored in a database. During the testing phase, the positions of testing points were determined using the Euclidean distance. On average, the system achieved an accuracy of approximately 3 m, with 75% of localization errors falling below 4.7 m.
An improved version of the KNN method for indoor localization is the weighted KNN (WKNN), which was introduced by Brunato and Battiti [
27]. In that paper, the positions of users are determined by calculating the weighted average of the RSSI distances between the estimated nearest neighbors and the current measurement. Tests performed in a real-world environment showed that the WKNN method achieved an accuracy of 3.1 ± 0.1 m, with the added benefit of low algorithmic complexity.
Yerbolat Khassanov et al. explored the use of end-to-end sequence models for WiFi-based indoor localization at a finer level [
4]. The study showed that the localization task can be effectively formulated as a sequence learning problem using Recurrent Neural Networks (RNN) with regression output. The use of regression output allows for estimating three-dimensional positions and enables scalability to larger areas. The experiments conducted on the WiFine dataset reveal that RNN models outperform non-sequential models such as KNN and FNN, achieving an average positioning error of 3.05 m for finer-level localization tasks.
Furthermore, Zhenghua Chen et al. proposed a deep LSTM network for indoor localization using WiFi fingerprinting [
28]. The network incorporates a local feature extractor that enables the encoding of temporal dependencies and the learning of high-level representations based on the extracted sequential local features. The experimental results demonstrate that the proposed approach achieves state-of-the-art localization performance, with mean localization errors of 1.48 m and 1.75 m in research lab and office environments, respectively.
To address neural machine translation tasks, Bahdanau et al. introduced the attention mechanism to the encoder-decoder model. This enables the model to learn alignment and translation simultaneously, allowing for adaptive selection of encoded vectors [
29]. The proposed approach exhibits substantial improvements in translation performance compared to the basic encoder-decoder approach, especially with longer sentences. Furthermore, an LSTM structure based on self-attention mechanism was introduced in [
30]. The overall results demonstrate the superiority of the proposed method in forecasting temporal sequences compared to other benchmark methods.
In general, LSTM has demonstrated exceptional performance in sequence prediction tasks, including fingerprint localization. It has been experimentally verified that it outperforms conventional methods such as KNN and WKNN. Additionally, the self-attention mechanism enables the model to consider the relationship between each element in the sequence. This leads to a better understanding of contextual information and more precise processing of sequence data. Based on that, we propose an SA-LSTM model with high accuracy and strong robustness for indoor localization systems based on fingerprinting.
5. Results and Discussion
Before comparing the performance of various methods, the sliding window length
L for the SA-LSTM method needs to be determined.
Figure 9 illustrates the mean positioning error as a function of the window size. As shown in the figure, SA-LSTM performs poorly when
L is set to 1 or 2. As
L increases, the average localization error of SA-LSTM shows a significant decrease. This occurs because when
L is set to a smaller value, the network model obtains less information, resulting in lower positioning accuracy. When
L is taken to 5 or 6, the average localization error fluctuates within a small range. To avoid additional computational complexity,
L is determined to be set to 4.
To compare our indoor localization approach, we have implemented an indoor localization system network based on LSTM, as described in [
28]. Additionally, we have implemented other methods such as RNN [
4], KNN, WKNN, FNN and Linear Regression. We have adjusted the parameters of these models within a certain range to optimize their performance. During the training process, all the model was validated using the validation set after each training epoch, and the model with the minimum average position error was saved for further evaluation.
The average and maximum positioning errors of all these methods are presented in
Table 3. Apparently, the SA-LSTM method outperforms other methods in terms of average positioning accuracy. Among these methods, the LSTM approach achieves the second-best performance in mean positioning accuracy, following the proposed SA-LSTM method. On the test set, the LSTM method results in a maximum error of 13.73 m and an average error of 3.07 m, which is 0.98 m and 1.31 m higher than the proposed SA-LSTM method. Compared to the RNN method, which has a mean positioning error of 4.16 m and a maximum error of 12.64 m, SA-LSTM improves the positioning accuracy by 2.4 m and 0.29 m. Moreover, SA-LSTM achieves a maximum improvement of 66.85% in average positioning accuracy compared to the Linear Regression method.
Figure 10 illustrates the MSE loss curve of the SA-LSTM and LSTM methods during the training process with 2D-moving trajectories. Our results indicate that exhibits a faster convergence rate in terms of training loss compared to the LSTM model. Moreover, after 200 epochs of training, the training loss of SA-LSTM converges to around 0, while the training loss of LSTM converges to around 0.5. The validation loss of SA-LSTM converges faster to near-stabilization values compared to LSTM, as demonstrated in the black dotted box in
Figure 10. Throughout the entire training process, we observed that the SA-LSTM model achieved a slightly lower minimum validation loss than the LSTM model. These results suggest that the SA-LSTM model is more effective in terms of training efficiency with the help of self-attention mechanism and shortcut connection.
Figure 11 illustrates the cumulative distribution function (CDF) of localization errors for the 2D-moving experiment. In total, a maximum localization error of 12.35 m is recorded for SA-LSTM, 15.22 m for KNN, and the largest maximum localization error of 15.42 m for WKNN. Compared to the KNN and WKNN methods, the SA-LSTM method showed a decrease in maximum localization error by 2.87 m and 3.07 m, respectively. Meanwhile, the maximum localization error of LSTM is 12.47 m, which is also higher than that of SA-LSTM. When considering the 90% percentile of the CDF, the proposed SA-LSTM model demonstrates a 90% location error of approximately under 3.86 m. In comparison, the LSTM, RNN, and KNN models exhibit location errors of around 4.36 m, 5.74 m, and 6.31 m, respectively. This suggests that the proposed SA-LSTM can achieve an improvement by 11.47%, 32.75%, and 63.47% in the 90% CDF compared to LSTM, RNN, and KNN, respectively.
Regarding the 3D-moving experiment, the proposed SA-LSTM model continues to exhibit superior performance in the localization system. Similarly, we compare the average and maximum positioning error of KNN, WKNN, FNN, Linear Regression, RNN, LSTM and SA-LSTM. As shown in
Table 4, the proposed SA-LSTM achieves an average positioning error of 2.83 m and a maximum positioning error of 57.64 m in the 3D-moving experiment. Compared to LSTM, SA-LSTM improves the average positioning accuracy by 31.64%. In addition, SA-LSTM reduces the average positioning errors by 2.1 m and the maximum localization errors by 3.32 m compared to RNN. Compared to KNN and WKNN, the SA-LSTM has an average positioning error that is 0.62 m and 0.61 m lower, respectively. The SA-LSTM has achieved the lowest average positioning error and the maximum positioning error in scenes involving 3D motion.
The loss curves for SA-LSTM and LSTM in the 3D-moving experiment are depicted in
Figure 12. The training loss of SA-LSTM and LSTM converge at a similar rate. As shown in the zoomed-in image in
Figure 12, the final convergence value of SA-LSTM is a bit lower. In terms of the validation loss, the SA-LSTM model exhibited better performance than the LSTM model. Specifically, the validation loss of SA-LSTM could eventually converge to 3, while that of LSTM remained above 4. Based on these findings, we can conclude that our proposed SA-LSTM model is significantly more efficient in terms of training efficiency compared to the conventional LSTM model.
Figure 14 illustrates the CDF of localization errors for the 3D-moving experiment. Overall, the proposed SA-LSTM still outperforms the other classical algorithms. LSTM network performs the second best, which achieves a 90% location error below 6 m, while RNN achieves a 90% location error below 8.45 m. compared to LSTM and RNN, SA-LSTM decreased the 90% CDF by 1.99 m and 4.44 m.
Figure 13.
Schematic diagram of referenced and estimated trajectories with a range of movement involving (a) two floors and (b) three floors.
Figure 13.
Schematic diagram of referenced and estimated trajectories with a range of movement involving (a) two floors and (b) three floors.
Furthermore, a couple of estimated trajectories are drawn in a 3D-moving experiment using the SA-LSTM model.
Figure 13 (a) and
Figure 13 (b) depict the moving trajectories, which involve transitions between two and three different floors, respectively. The red lines correspond to the reference trajectory, whereas the blue lines depict the estimated trajectories generated by SA-LSTM. The experimental results indicate that the measured position points in the referenced trajectories exhibit anomalous behavior during pedestrian transitions between different floors. This behavior is attributed to the reliance on elevators for inter-floor movement, which leads to abnormal fluctuations in the measurement signal, resulting in anomalous measured positions. From the trajectories shown in
Figure 13 (a) and
Figure 13 (b), it can be demonstrated that the proposed SA-LSTM model exhibits a satisfactory performance when the pedestrians under test move within a single floor. However, when pedestrians move between floors, the estimated position points generated by the SA-LSTM model may exhibit some fluctuations within a narrow range. Nevertheless, once the pedestrians reach a specific floor, the SA-LSTM model can promptly resume its effective operation.
The 90% quantile of CDF is an important performance evaluation metric in location systems, as highlighted in 3GPP Rel.18 [
35]. To comprehensively evaluate the performance of each algorithm in both 2D-moving and 3D-moving experiments, we calculate the 90% error for each algorithm and present the results in
Figure 15.
In both experimental scenarios, SA-LSTM demonstrates the highest localization accuracy compared to the other algorithms, as indicated by its remarkably low 90% positioning error. Under the 3D-moving experimental environment, SA-LSTM achieves a 90% localization error under 3.86 m, which is 0.5 m and 1.88 m lower than that of LSTM and RNN, respectively. Compared to classical KNN algorithms, the SA-LSTM model consistently exhibits a lower 90% positioning error under both experimental environments. These results suggest that SA-LSTM demonstrates high accuracy and stability in the field of indoor positioning, highlighting its potential to outperform traditional methods and pave the way for more advanced and reliable indoor positioning systems.