5.1. Experimental Settings
In this study, we simulate and validate the algorithm using Python 3.7 and TensorFlow 2.2.0 environments. Our experiments emulate a distributed FL training scenario with various categories of devices. The setup consists of an aggregation server and 10-80 devices.
To demonstrate the robustness of the proposed approach, malicious nodes are introduced in the experiments to simulate devices with subpar training quality.During server aggregation, the parameters of a node can be modified to random values, simulating the presence of a malicious node in local training.[
32]. We initially select the MNIST dataset for training. The dataset is uniformly partitioned and allocated to the nodes as the local dataset. In addition, the CIFAR dataset is used in this study to validate the efficacy of the proposed algorithm.
In our simulation, the client device’s CPU cycle frequency
adheres to a uniform distribution
while the wireless bandwidth
r follows a uniform distribution
[
33] This setup allows for a diverse range of computational and communication capabilities among the devices.
A convolutional neural network (CNN) serves as the FL training model, featuring a structure that consists of six convolutional layers, three pooling layers, and one fully connected layer. The DQN algorithm employs four threads to interact with the external environment, collecting empirical data in the process. The reward discount factor is set to 0.9, and the learning rate of the Q network (value network) is configured at 0.0001. The target network is updated with the parameters of the Q network after every 100 rounds of agent training. In addition, the buffer size for experience replay is set to 10, 000. The settings of specific experimental parameters are shown in
Table 2.
In this study, we compare the proposed algorithm (FL-DQN) with three alternative approaches:
1) FL-Random: This algorithm does not utilize Deep Reinforcement Learning (DRL) for node selection during each iteration of FL training. Instead, it selects nodes at random.
2) FL-Greedy: The algorithm selects all participating nodes for model aggregation in each iteration of the FL training.
3) Local Training: This approach does not incorporate any FL mechanism, and the model is solely trained on individual local devices [
29] .
These comparisons help to assess the relative effectiveness and efficiency of the FL-DQN algorithm.
Table 2.
Simulation parameter setting.
Table 2.
Simulation parameter setting.
Parameter Type |
Parameter |
Parameter Description |
Parameter Value |
Equipment and model parameters |
|
Number of terminals |
100 |
|
CPU cycle frequency |
[0,1] |
|
Wireless Bandwidth |
[0,2] |
|
Eocal data sets |
600 |
|
Local Iteration |
2 |
|
Minimum sample size |
10 |
|
Learning Rate |
0.01 |
Node |
Number of nodes involved |
[10,80] |
|
Number of CPU cycles required for |
7000 |
|
training per data bit |
|
|
Global Model Size |
20Mbit |
DQN parament |
A |
Agents |
4 |
s |
Training steps |
1000 |
Target Q |
Q Network |
0.0001 |
|
Bonus Discount Factor |
0.9 |
circle |
Strategy Update Steps |
100 |
|
Experience replay buffer |
10000 |
B |
Batch-size |
64 |
5.2. Analysis of Results
Experiments evaluate four algorithms, including accuracy, loss function, and time delay. Given that the MNIST dataset is a classification problem, the accuracy in the experiment is defined as the ratio of the number of correct classifications to the total number of samples. This metric allows for a comprehensive comparison of the performance of the algorithms.
We divide the experiment into three groups to present the comparison of accuracy, loss function, and delay under different conditions.
Figure 3 presents the accuracy of four algorithms under different conditions of changing the number of iterations, the number of nodes, and the proportion of malicious nodes.
Figure 3a illustrates the accuracy variation of the four algorithms when 20% of the nodes are malicious. From
Figure 3a, it is evident that the accuracy of the models obtained through the four mechanisms is low during the early stages of training, suggesting that sufficient training iterations are necessary to ensure model accuracy. Upon reaching eight iterations, the accuracy of the models trained by FL-DQN, FL-Random, and FL-Greedy mechanisms tends to stabilize. When the number of iterations reaches 25, the accuracies of FL-DQN, FL-Random, FL-Greedy, and Local Training stabilize around 0.98, 0.96, 0.96, and 0.95, respectively. The FL-DQN algorithm maintains strong training performance when faced with a limited number of malicious nodes and varying data quality.
Figure 3b displays the accuracy variation of the four algorithms when 40% of the device nodes are malicious. As observed in
Figure 3b, FL-DQN rapidly converges to the highest accuracy (0.98) when confronted with a large number of malicious nodes. In contrast, the model quality obtained by FL-Random decreases due to the influence of malicious nodes and stabilizes around 0.95, which is comparable to the training performance of Local Training. For the FL-Greedy algorithm, the accuracy decreases to 0.93. The FL mechanism proposed in this study successfully balances data quality and device training, effectively ensuring optimal model quality.
Figure 3c presents the model accuracy achieved by the four algorithms for different numbers of nodes. The FL-DQN algorithm achieves the highest accuracy when dealing with various node quantities. For example, when 40 nodes are considered, the accuracy of the four algorithms is 0.967, 0.938, 0.932, and 0.754, respectively. The accuracy of FL-DQN algorithm improves by 3.0% and 22.0% compared to FL-DQN and Local Training, respectively. The results also demonstrate that the proposed method exhibits strong scalability in terms of node size, maintaining peak performance as the number of nodes increases.
Figure 3d presents the accuracy of the models obtained by the four algorithms for a fixed number of training rounds (30 rounds) and different percentages of malicious nodes (ranging from 10% to 80%). We observe that FL-DQN can efficiently filter out high-quality nodes for model aggregation when dealing with different proportions of malicious nodes, in contrast to FL-Random, FL-Greedy and Local Training. This filtering process ensures the quality of the overall model, leading to the highest accuracy and smallest loss function values in FL-DQN. As a result, it can be concluded that the proposed method exhibits excellent robustness.
Figure 3.
Accuracy experimental group.
Figure 3.
Accuracy experimental group.
Figure 4 presents the loss functions of four algorithms under different conditions of changing the number of iterations, the number of nodes, and the proportion of malicious nodes.
Figure 4a presents the variation of the loss functions for the four algorithms when 20% of the nodes are malicious. The FL-DQN algorithm converges faster than the remaining four algorithms and exhibits the lowest value for the loss function. This also highlights the advantages of the FL-DQN approach in terms of convergence and loss reduction.
Figure 4b shows the variation of the loss functions of the four algorithms when 40% of the malicious nodes are present. Similar to the convergence in accuracy, the FL-DQN algorithm converges faster and has the smallest loss function value than the remaining three algorithms, while FL-Random, FL-Greedy, and Local Training always have higher loss function values due to the presence of malicious nodes. Comparing the above four sets of simulation results, it can be seen that FL-DQN always converges quickly to the highest accuracy with different numbers of malicious nodes and has the lowest loss function compared to FL-Random, FL-Greedy and Local Training. At the same time, for FL-DQN algorithm, the accuracy rate of 0.98 is maintained for both 20% and 40% of malicious nodes. Therefore, it can be concluded that the proposed method in this paper is remarkably robust.
Figure 4c presents the loss function achieved by the four algorithms for different numbers of nodes. The FL-DQN algorithm achieves the highest accuracy when dealing with multiple numbers of nodes. The difference between FL-Random and FL-Greedy loss functions is not significant.
Figure 4d shows the loss function values of the four algorithms obtained at the end of training under the same conditions. Compared to FL-Random, FL-Greedy, and Local Training, FL-DQN can handle different proportions of malicious nodes to guarantee the quality of the whole model and thus obtain the minimum of the loss function.
Figure 4.
Loss function experimental group.
Figure 4.
Loss function experimental group.
Figure 5 presents the latency of four algorithms under different conditions of changing the number of iterations, the number of nodes, and the proportion of malicious nodes.
Figure 5a shows the changes in latency of four algorithms with 20% malicious nodes.
Figure 5b shows the changes in latency of four algorithms with 40%malicious nodes. Compared to the remaining three algorithms, the FL-DQN algorithm is able to complete the training task faster and has the smallest variation in the time used per round.
As shown in
Figure 5c, the FL-DQN algorithm can guarantee low latency in dealing with various numbers of nodes, as it can effectively select high-quality training devices for model aggregation. Taking the number of nodes as 40, the latency values for the four algorithms are 11.3s, 13.8s, 17.4s, and 16.4s, respectively. The FL-DQN algorithm reduces the latency by 18%, 35%, and 31%compared to FL-Random, FL-Greedy, and Local Training, respectively. These results indicate that the proposed algorithm can efficiently complete FL training.
Figure 5d presents the latency changes of four algorithms with fixed training rounds (30 rounds) and different percentages of malicious nodes (from 10% to 80%). From
Figure 5d, it can be observed that even when facing a large number of malicious nodes, the FL-DQN algorithm can still complete the training task relatively quickly.
Figure 5.
Latency experimental group.
Figure 5.
Latency experimental group.
The following section compares and validates four algorithms using the CIFAR dataset.
Figure 6 illustrates the accuracy variations of the four algorithms with 20% malicious device nodes. The CIFAR dataset requires significantly more training iterations than the MNIST dataset. When the number of iterations reaches 60, the accuracy of the models trained by the four algorithms stabilizes. The accuracies of FL-DQN, FL-Random, FL-Greedy, and Local Training stabilize at 0.80, 0.71, 0.68, and 0.58, respectively. The FL-DQN algorithm demonstrated good training performance in handling malicious nodes and differential data quality.
Figure 6.
Accuracy comparison of CIFAR dataset
Figure 6.
Accuracy comparison of CIFAR dataset
Figure 6 displays the loss function variations of the four algorithms with 20% malicious device nodes. The FL-DQN algorithm achieves faster convergence and has the smallest loss function value. These results demonstrate that the FL-DQN algorithm outperforms the FL-DQN and Local Training algorithms in terms of loss function convergence.
Figure 7.
Comparison of loss functions for CIFAR dataset.
Figure 7.
Comparison of loss functions for CIFAR dataset.