1. Introduction
Human Activity Recognition (HAR) aims to identify the physical movements of people, enabling intelligent systems to assist individuals in improving their quality of life. This technology has numerous indoor and outdoor applications, including smart homes, healthcare, public safety, and sports [
1]. Additionally, the proliferation of devices such as smartphones, smartwatches, and fitness trackers has paved the way for numerous new applications. These devices capture a rich amount of contextual data, facilitating remote patient monitoring, identifying and preventing potential hazards like falls, promoting healthier lifestyles, and creating automated activity records [
2]. Such advancements are particularly beneficial for elderly care, aiding their independent living by ensuring safety and timely interventions. Despite recent advancements in HAR technology, challenges persist in accuracy, real-time processing, data scarcity, computational complexity, and user privacy.
Building on the importance of HAR, especially in the context of independent living for the elderly, two primary technologies have emerged for data acquisition: vision-based and sensor-based systems. Vision sensing relies on high-resolution cameras coupled with advanced computer vision techniques. However, it faces challenges such as privacy concerns and quality degradation due to lighting conditions and camera limitations [
3]. In contrast, sensor-based techniques offer both wearable and non-wearable solutions. Non-wearable sensors, especially those utilising radio frequencies (RF) like channel state information (CSI) or received signal strength indicator (RSSI), are gaining popularity for indoor human activity monitoring due to their non-invasive and privacy-conscious nature [
4,
5]. Wearable sensors, including pedometers, accelerometers, and gyroscopes, remain popular choices for HAR, with smartphones and smartwatches emerging as preferred devices for activity tracking.
The decision to use vision or sensor-based systems depends on application requirements, environment, and user preferences, each with its own advantages and challenges. However, wearable sensors provide more accurate data for HAR as they directly capture detailed human movements. This precision is crucial, especially in dynamic environments and is particularly vital for elderly care, where individuals may engage in activities that vary in intensity and nature [
6]. On the other hand, non-invasive sensing techniques, such as video, present significant privacy concerns, making them less suitable for applications where user privacy is paramount. Moreover, establishing infrastructure for outdoor HAR using vision or RF sensing presents challenges, often due to environmental factors, equipment costs, and maintenance requirements. Additionally, non-invasive RF-based systems, while promising, still face hurdles in achieving high accuracy, especially when monitoring multiple individuals simultaneously [
7]. The decision between these techniques requires a careful balance of accuracy, privacy, and feasibility based on the specific context of the application.
Traditionally, HAR systems are predominantly operated in a centralised architecture, as depicted in
Figure 1. In such setups, various sensors collect data from multiple participants and share it with a central server or cloud infrastructure for processing and analysis [
8]. This centralised data processing inherits several limitations, especially when the volume and variety of data sources have expanded. With the advent of advanced data analytics, deep learning (DL) has emerged as a powerful tool for HAR, enabling the extraction of intricate patterns directly from raw sensor data, thereby eliminating the need for manual feature engineering. DL models, such as convolutional neural networks (CNNs) [
9] and recurrent neural networks (RNNs) [
10,
11], have shown remarkable success in HAR applications, often outperforming traditional machine learning techniques. However, the adoption of DL approaches in HAR is not without challenges. One of the primary concerns is data scarcity, especially labelled data, which is crucial for training DL models. The process of labelling vast amounts of data is labour-intensive and often requires domain expertise [
3]. Furthermore, the centralised nature of HAR systems poses significant communication and storage costs, especially when transmitting high-dimensional raw data [
12]. Additionally, processing this data in centralised servers can incur additional latency, especially when dealing with real-time activity recognition tasks. More critically, centralising user data exposes individuals to potential privacy breaches, a concern that has gained prominence in the age of stringent data protection regulations [
13].
To overcome the limitations of centralised model training, federated learning (FL) has emerged as a promising distributed learning paradigm. FL enables collaborative learning using multiple participants for model training, without any data sharing [
14]. This distributed learning architecture offers privacy by design, reduces communication and storage overhead, and ensures real-time processing, a crucial requirement for HAR tasks [
8]. Furthermore, the participation of multiple clients in FL offers significant advantages. Each participant, with their distinct data, contributes to the overall model, resulting in a more generalised and robust global model that captures a broader range of human activities. FL enables real-time processing by dividing computational tasks among different devices. Its decentralised architecture is scalable and can accommodate various devices and data sources. Additionally, FL’s collaborative training approach allows personalisation, which is achieved by fine-tuning the global model using local data, improving the accuracy and relevance of activity recognition. These personalised models use individual-specific data to provide more precise and context-aware activity recognition, aligning the system’s predictions with the user’s unique patterns and behaviours.
In the realm of HAR, the need for distributed learning is becoming increasingly important, especially given the challenges associated with centralised systems. As we transition towards more decentralised and edge-based processing, the computational demands of traditional DL models can become a significant bottleneck, especially on resource-constrained edge devices [
15]. While FL offers several advantages, one major drawback is the hardware available in the market often struggles to support this distributed intelligence with energy efficiency. To overcome this challenge, neuromorphic computing emerges as a potential solution. Inspired by the human biological neural systems, neuromorphic computing promises energy-efficient and rapid signal processing. Spiking neural networks (SNNs), a subset of neuromorphic learning, are gaining attention due to their unique event-driven processing of binary inputs, known as ’spikes’ [
16]. Unlike traditional DL models, SNNs operate on a temporal, event-driven paradigm, making them particularly suitable for on-device learning. The real-time and continuous nature of activity data in HAR accentuates the potential advantages of neuromorphic computing, highlighting the necessity for models that can adeptly capture the temporal dynamics of human activities.
Although SNNs are computationally efficient, traditional DL models such as LSTM networks are more effective in processing sequential data [
17]. Given these considerations, a compelling need emerges for a model that synergistically combines the strengths of SNNs and LSTMs. Therefore, we introduce the hybrid neuromorphic federated learning (HNFL) approach, which integrates the SNN with LSTM, creating a Spiking-LSTM (S-LSTM) model. The S-LSTM is ingeniously crafted to leverage the computational efficiency of SNNs while harnessing the sequential data processing capabilities of LSTMs. This fusion offers a harmonious balance between efficiency and accuracy, positioning the S-LSTM as a pioneering model for HAR in a federated setting. To the best of the authors’ knowledge, no prior research has presented such a hybrid model for HAR. The key contributions of this paper are as follows:
We introduce a novel HNFL framework tailored for HAR using wearable sensing technology. The hybrid design of S-LSTM seamlessly integrates the strengths of both LSTM and SNN in a federated setting, offering privacy preservation and computational efficiency.
A comprehensive analysis is conducted using two distinct publicly available datasets, and the results of the S-LSTM are compared with spiking CNN (S-CNN) and simple CNN. This dual-dataset testing approach validates the robustness of the proposed framework and provides valuable insights into its performance in varied environments and scenarios.
This study addresses a significant issue of client selection within the context of federated HAR applications. We conduct a thorough investigation into the implications of random client selection and its impact on the overall performance of the HAR model. This analysis provides valuable insights into achieving the optimal balance between computational, communication efficiency and model precision, which guides the ideal approach for client selection in federated scenarios.
The rest of the paper is organised as follows:
Section 2 introduces the related work for FL-based HAR. In
Section 3, preliminaries and system model is discussed, whereas
Section 4 explains the simulation setup.
Section 5 provides the details on results and discussion, and
Section 6 concludes the research findings.
4. Simulation Setup
This section provides a detailed discussion of the datasets, performance evaluation strategy and metric used in this study.
4.1. Dataset Description
Despite HAR being a well-investigated topic, attempts to evaluate it using smartphone data is a recent and very active area of research. Several datasets have been collected using smartphones, which exhibit severe challenges, including sensor configuration, sampling frequencies, accessibility, realism, size, heterogeneity, and annotation quality. Additionally, there is an extreme class imbalance due to the stark differences in activity patterns between classes. Thus, HAR is the perfect testbed for assessing neuromorphic federated learning (NFL) in practical heterogeneous contexts. Furthermore, our focus was on reproducibility, heterogeneity, and realistic datasets, which led us to select two publicly available datasets. The UCI dataset [
31], which is one of the most commonly used in HAR benchmarking studies, was chosen first. However, UCI was collected in a strictly controlled laboratory environment, and the sample size was also very limited. Therefore, we also employed the Real-World dataset [
32], recorded outdoors without restrictions. The details of these two datasets are explained below:
UCI dataset:
The UCI dataset is obtained using the Samsung Galaxy S II smartphones worn by 30 volunteers for distant age groups and genders. The volunteers were engaged in six daily routine activities: sitting, standing, lying down, walking, walking downstairs and walking upstairs. Each subject repeated these activities multiple times with two distant scenarios for device placement. These scenarios include the placement of a smartphone on the left wrist and the preferred position of each subject. The smartphone’s embedded accelerometer and gyroscope sensors captured triaxial linear acceleration and angular velocity at a rate of 50 Hz. The raw signals are pre-processed to minimise noise interference, and 17 distinctive signals were extracted, encompassing various time and frequency domain parameters, such as magnitude, jerk, and Fast Fourier Transform (FFT). For analysis, signals were segmented into windows of 2.56 seconds, with an overlap of 50%, culminating in 561 diverse features per window derived from statistical and frequency measures. This dataset contains 10,299 instances, with a strategic split of 70% for training and 30% reserved for testing.
However, the dataset is merged and split into five subsets to make a local dataset of each participant. The data distribution among participant is kept highly unbalanced to ensure the actual case for the FL scenario. Further, the dataset of each client is further divided into training (80%) and testing (20%) split and the test split of each client is combined to create a global testset for performance evaluation.
Real-World dataset:
Although the UCI dataset is very commonly used in HAR studies, however, it has limitations as it is collected in a controlled laboratory environment. Additionally, the sample size is very small to explore the true potential of FL. Hence, we chose a more realistic dataset collected by Sztyler and Stuckenschmidt [
32]. The data was gathered from 15 participants (eight male, seven female) executing eight common real-world activities, including jumping, running, jogging, climbing stairs, lying, standing, sitting, and walking. The accelerometer data was collected from seven different locations on the body, which includes the head, chest, upper arm, wrist, forearm, thigh, and shin. In the data collection process, smartphones and a smartwatch are mounted on the aforementioned anatomical sites, collecting the data at the frequency of 50 Hz. The dataset incorporates 1065 minutes of accelerometer measurements per on-body position per axis, amounting to extensive volume.
Additionally, the Real-World dataset is also well-suited for HAR study as it is captured in a naturalistic environment, exhibiting the realistic class imbalance. For instance, the jumping activity comprises 2% of data compared to standing, which constitutes 14% of the total data. Additionally, factors such as high-class imbalance and the availability of separated user data make this dataset an appropriate choice for an extensive study on FL approaches for HAR.
4.2. Performance Metrics
HAR is treated as a multi-class classification problem where various metrics are used to evaluate the performance of the model. One commonly used metric is accuracy, defined as the ratio of correctly predicted instances to the total number of cases. However, the accuracy limitations become pronounced in the context of highly imbalanced datasets. For example, in scenarios where one class dominates a dataset, a model may achieve higher accuracy by simply predicting that class, disregarding the distribution of other classes. This phenomenon is known as the accuracy paradox, highlighting the risk of relying solely on accuracy as a performance metric when dealing with diverse datasets. Therefore, the alternative metrics to evaluate the performance of the model are precision, recall and F1-Score, defined as follows:
Precision: This metric quantifies the number of correct positive predictions made by the model relative to the total number of positive predictions mathematically represented as:
where TP and FP are true positives and false positives, respectively.
Recall (Sensitivity): This metric measures how well the model can correctly identify positive instances, which is particularly important in contexts where missing positive instances (false negatives) can have serious consequences. Recall is mathematically represented as:
where FN represents a false negative.
F1-Score: It is the harmonic mean of precision and recall, which provides a balance between the two metrics, especially when there’s an uneven class distribution. F1-Score is mathematically represented as:
Furthermore, all experimental procedures are done in a simulated environment for a comprehensive evaluation. This allowed us to gauge the effectiveness of the models’ two metrics: global performance evaluation and personalised model assessment. The global metric is used to determine the proficiency of the model across an entire dataset using global testset, which helps to assess its generalisation capabilities. On the other hand, personalised performance assessments are done at the participant level, with the best global model fine-tuning using local data. This personalised training creates a customised model, and its performance is evaluated using the local testset. We compare the results of both personalised and global models for each participant.
Author Contributions
Conceptualization, A.Khan, H.Manzoor, and A.Zoha; methodology, A.Khan, H.Manzoor, F. Ayaz, A.Zoha; software, A.Khan; validation, A.Khan, H.Manzoor, and A.Zoha; formal analysis, A.Khan, H.Manzoor, F.Ayaz, and A.Zoha; writing original draft, A.Khan; writing, review and editing, A.Khan, H.Manzoor, F.Ayaz, M.Imran, and A.Zoha; supervision, A.Zoha
Figure 1.
Conceptual framework of centralised indoor HAR using wearable sensor.
Figure 1.
Conceptual framework of centralised indoor HAR using wearable sensor.
Figure 2.
Conceptual FL framework for HAR using wearable sensing in the outdoors.
Figure 2.
Conceptual FL framework for HAR using wearable sensing in the outdoors.
Figure 3.
Spiking neurons propagation process.
Figure 3.
Spiking neurons propagation process.
Figure 4.
Proposed hybrid S-LSTM model.
Figure 4.
Proposed hybrid S-LSTM model.
Figure 5.
Learning curve for UCI-dataset, trained for 500 communication rounds.
Figure 5.
Learning curve for UCI-dataset, trained for 500 communication rounds.
Figure 6.
The confusion matrix for three DL models compared in this study. The index represents the activity where the label corresponding to the activities are: ([1]: Walking, [2]: Walking upstairs, [3]: Walking downstairs, [4]: Sitting, [5]: Standing, [6]: Laying).
Figure 6.
The confusion matrix for three DL models compared in this study. The index represents the activity where the label corresponding to the activities are: ([1]: Walking, [2]: Walking upstairs, [3]: Walking downstairs, [4]: Sitting, [5]: Standing, [6]: Laying).
Figure 7.
Learning curve for Real-World dataset, trained for 300 communication rounds.
Figure 7.
Learning curve for Real-World dataset, trained for 300 communication rounds.
Figure 8.
The confusion matrix for three DL models compared for Real-World data set. The index represents the activity where the label corresponding to the activities are: ([1]: climbing down, [2]: climbing up, [3]: jumping, [4]: lying, [5]: running, [6]: sitting, [7]: standing, [8]: walking).
Figure 8.
The confusion matrix for three DL models compared for Real-World data set. The index represents the activity where the label corresponding to the activities are: ([1]: climbing down, [2]: climbing up, [3]: jumping, [4]: lying, [5]: running, [6]: sitting, [7]: standing, [8]: walking).
Figure 9.
Learning curve for Real-World dataset, with 50% random choosing of participant trained for 300 communication rounds.
Figure 9.
Learning curve for Real-World dataset, with 50% random choosing of participant trained for 300 communication rounds.
Figure 10.
Accuracy comparison graph for global model and personalised model for each client using the local test set. The personalised accuracy is obtained after fine-tuning using the local dataset.
Figure 10.
Accuracy comparison graph for global model and personalised model for each client using the local test set. The personalised accuracy is obtained after fine-tuning using the local dataset.
Table 1.
Comparative results of a global model for CNN, S-CNN and S-LSTM for UCI dataset trained in federated environment.
Table 1.
Comparative results of a global model for CNN, S-CNN and S-LSTM for UCI dataset trained in federated environment.
|
CNN |
S-CNN |
S-LSTM |
Class |
Precision |
Recall |
F1-Score |
Precision |
Recall |
F1-Score |
Precision |
Recall |
F1-Score |
Walking |
0.98 |
0.97 |
0.98 |
0.93 |
0.95 |
0.94 |
0.99 |
0.99 |
0.99 |
Walking upstairs |
0.98 |
0.98 |
0.98 |
0.93 |
0.96 |
0.95 |
0.99 |
0.99 |
0.99 |
Walking downstairs |
0.98 |
0.99 |
0.98 |
0.93 |
0.88 |
0.91 |
1.00 |
1.00 |
1.00 |
Sitting |
0.89 |
0.88 |
0.88 |
0.90 |
0.88 |
0.89 |
0.93 |
0.94 |
0.93 |
Standing |
0.89 |
0.90 |
0.89 |
0.89 |
0.91 |
0.90 |
0.94 |
0.93 |
0.94 |
Laying |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
Table 2.
Comparison of classification metrics between different DL techniques for Real-World dataset.
Table 2.
Comparison of classification metrics between different DL techniques for Real-World dataset.
|
CNN |
S-CNN |
S-LSTM |
Class |
Precision |
Recall |
F1-Score |
Precision |
Recall |
F1-Score |
Precision |
Recall |
F1-Score |
Climbing down |
0.86 |
0.88 |
0.87 |
0.91 |
0.89 |
0.90 |
0.90 |
0.88 |
0.89 |
Climbing up |
0.87 |
0.85 |
0.86 |
0.91 |
0.87 |
0.89 |
0.90 |
0.87 |
0.88 |
Jumping |
0.95 |
0.95 |
0.95 |
0.96 |
0.98 |
0.97 |
0.98 |
0.96 |
0.97 |
Lying |
0.82 |
0.87 |
0.84 |
0.88 |
0.89 |
0.88 |
0.90 |
0.92 |
0.91 |
Running |
0.95 |
0.85 |
0.90 |
0.97 |
0.86 |
0.91 |
0.94 |
0.86 |
0.90 |
Sitting |
0.70 |
0.74 |
0.72 |
0.73 |
0.80 |
0.76 |
0.77 |
0.82 |
0.80 |
Standing |
0.75 |
0.75 |
0.75 |
0.75 |
0.82 |
0.78 |
0.77 |
0.80 |
0.78 |
Walking |
0.88 |
0.89 |
0.89 |
0.91 |
0.90 |
0.90 |
0.89 |
0.91 |
0.90 |