3.2. Small-scale Power Grid Faults
In this section, we simulated small-scale faults in the power grid of the TNCS. Specifically, we induce faults in the power lines with the line numbers 31, 32, 33, 34, 36, and 37, and these lines are placed in an open-circuit state. The specific experimental parameter settings are as follows.
The Actor network consists of an input layer with 142 neurons, a first hidden layer with 256 neurons, a second hidden layer with 128 neurons, and an output layer with 6 neurons. The target Actor network is updated once after every five updates of the Actor network.
The Critic network consists of an input layer with 142 neurons, a first hidden layer with 256 neurons, a second hidden layer with 256 neurons, and an output layer with 1 neurons. The target Critic network is updated once after every five updates of the Critic network.
The ReLU function is used as the activation function in the hidden layers of all networks. The Actor network is updated once after every three updates of the Critic network. Set the discount factor to 0.99, the exploration noise standard deviation to 0.15, the policy noise standard deviation to 0.3, the batch size to 64, the total number of episodes to 1000, and the , , , and values to 0.5, 0.1, 0.1, and 0.005, respectively. The parameter , specifically, is used in the soft update.
The simulation experimental results are as follows.
The result in
Figure 5 indicates that the Actor network is well-trained and able to make accurate and effective decisions.
The obtained restoration scheme is shown in
Table 1. The restoration sequence is primarily determined by the order of line restoration; once a line is restored, the connected nodes automatically begin their restoration process (except for black-start nodes). The restoration sequence refers specifically to the order of initiating restoration. As a result, the restoration processes for multiple faults can proceed simultaneously. For example, the restoration of line 36 can be initiated during the restoration of node 1.
To validate the optimality of the restoration sequence, we compared it with several backup schemes based on the restoration benefit criterion. As shown in
Table 2, the restoration benefit obtained by our proposed scheme, denoted
, is 15.36% to 25.52% greater than that of the other schemes, indicating that our scheme is optimal in terms of restoration benefit.
Additionally,
Figure 6 shows the improvement effect of the TD3 algorithm.
It can be observed that the occurrence number of is lower in the improved algorithm compared to the unimproved version. This indicates that the evaluation of restoration actions in the improved algorithm is not solely based on maximizing , but also considers system security issues caused by energy supply-demand imbalances, as represented by the inclusion of a multiplicative term. Using this method, it is possible to lower the original estimated values , as shown by the lower values of the line in the figure for the improved TD3 algorithm compared to the unimproved TD3 algorithm. When the chosen restoration action leads to an energy supply-demand imbalance, equals 1, the estimated value decreases, and the Actor network receives feedback indicating that the chosen restoration action is not optimal. As the number of episodes increases, the Actor network gradually learns to avoid restoration actions that may cause energy supply-demand imbalances. Consequently, in the figure, we can observe that the frequency of decreases over time, with only three occurrences between episodes 800 and 1000, representing a probability of 1.5%. This is a 66.67% reduction compared to the unimproved TD3 algorithm, and demonstrates the effectiveness of our improvements to the TD3 algorithm in reducing the probability of system security issues caused by energy supply-demand imbalances during the restoration process.
3.3. Large-scale Power Grid Faults
We set all power lines to an open-circuit state, paralyzing the power grid in TNCS. In this scenario, the impact of uncertainties such as failed unit start-up and the occurrence of new faults caused by the restoration process is analyzed, and algorithm proposed in this study is compared with other algorithms to evaluate the performance and efficacy.
Set the batch size to 128 and the total number of episodes to 5000, while the settings of the remaining experimental parameters are identical to those in the small-scale fault scenario. The simulation results are shown below.
We can observe that curve 3 gradually converges to a stable value in
Figure 7, which indicates that the Actor network is well-trained and able to make accurate and effective decisions in the scenario of large-scale power grid faults. Furthermore, curve 3 exhibits the lowest average reward, and both curve 3 and curve 1 show significantly higher fluctuations compared to curve 2. This is due to the fact that both generator start-up failures and new faults occurring during the restoration process result in a reward value of -1. Consequently, curve 3 exhibits lower values than the other two curves. The presence of new faults increases the uncertain change of
and the difficulty of convergence, leading to higher fluctuations in curve 3 and curve 1.
To validate the impact of uncertainties on restoration sequence, we take the restoration sequence of power line 42 as an example to observe the change of its restoration sequence, and the results are shown in
Figure 8. We can observe that in the case corresponding to curve 3, the restoration sequence of power line 42 in 10 experiments is approximately the 17th step, which differs from the cases corresponding to curves 1 and 2. This indicates that uncertainties during the restoration process can alter the restoration sequence, and should not be ignored in practical restoration process.
At the end of this section, we also compare the proposed algorithm with other intelligent algorithms to verify the superiority of the proposed algorithm in solving the fault recovery problem of the TNCS. The results are shown in
Table 3. We can observe that Ours outperforms other algorithms by 0.3% to 20.96% in terms of restoration benefit, and in terms of convergence time, there exists a reduction of 2.82% to 14.39% compared to other algorithms except for the PSO algorithm, with only a marginal increase of 1.2% compared to the PSO algorithm. The aforementioned comparative results demonstrate that ours has distinct advantages in terms of both restoration benefit and convergence time, indicating its superiority in an overall assessment. Moreover, compared to the TD3 algorithm, although the restoration benefit is very close, ours exhibits an 8.93% reduction in time. This is due to improvement made to the TD3 algorithm, which reduces the occurrence of new faults during the restoration process and effectively reduce the uncertain change of
, thereby reducing convergence difficulties during training.
3.4. Communication Faults
In this section, we add the simulation of communication faults on the basis of large-scale power grid faults to validate the resilience of algorithm proposed in this study.
Specifically, we assume that DoS attackers send a significant volume of disguised packets to information nodes 31, 32, 33, and 34, resulting in their infection. As a result, the communication channels within the coupling layer connected to the information nodes and the communication links within the information network of the control layer become blocked. When communication delays are detected, the technical staff at the control center suspends the restoration of power grid faults and initiate the deployment of firewalls and intrusion detection systems to restore communication faults. Equation (25) demonstrates that the impact of communication delays on restoration benefit depends on its occurrence timing and restoration speed. To facilitate statistical analysis, the occurrence time of communication delays is divided into six stages: stage 1: 1 to 6 steps, stage 2: 7 to 12 steps, stage 3: 13 to 18 steps, stage 4: 19 to 24 steps, stage 5: 25 to 30 steps, and stage 6: More than 30 steps. We set , with corresponding to the first stage, corresponding to the second stage, etc. There are five levels of restoration speed: level 1: , level 2: , level 3: , level 4: , and level 5: . Among them, indicates that the restoration of communication delay requires one step, requires two steps, etc. We set and . The remaining experimental parameters are maintained in accordance with the previous section. The simulation results are shown as follows.
As shown in
Figure 9, the restoration benefit of the communication delay curves is less than that of the normal communication curve, which is due to communication delay causes a reduction in
value of certain power lines, illustrated in
Figure 10.
Figure 9 also illustrates two additional aspects: first is the impact of the occurrence time of DoS attacks on restoration and second is the impact of restoration speed of communication delay on restoration.
The timing of DoS attacks is characterized by uncertainty, while the reduction in restoration benefit caused by DoS attacks can be mitigated by adjusting the value of
. According to
Table 4, when communication delay occurs in stage 1, stage 2, and stage 3, setting
to its maximum value of 1 minimizes the reduction in restoration benefit by 9.23%, 4.22 %, and 0.68%, respectively, compared to the normal communication benefit of 12.3968 MW. However, in stage 4, setting
to 1 leads to a restoration benefit of 12.510 MW, which is higher than the normal benefit and unrealistic. Therefore,
cannot be set to 1, and
results in a restoration benefit of 12,267 MW, which is marginally less than 12,3968 MW. Thus, the valid range for
in stage 4 is
. Similarly, in stage 5 and stage 6, the valid range for
is
and
, respectively.
For the second aspect, it can be observed that the restoration benefit of the communication delay curve corresponding to
is greater than that of the curves corresponding to other restoration speed in all restoration stages. This is due to restoring communication delay leads to obtain the reward
. If the restoration speed is fast, such as completing the restoration in a single step, one
is obtained. Alternatively, multiple
can be obtained if the restoration speed is slow and requires multiple stages to complete. However,
and
are not equivalent, and a difference exists between them. Obtaining multiple
can lead to overestimation or underestimation of
, which may result in policy bias and affect restoration decision-making. To solve this issue, we can adjust the values of
and
to reduce the difference between
and
. It is important to note that we cannot directly control the restoration speed due to its uncertainty and dependence on the defensive deployment level of the information network. When the occurrence time of DoS attacks is fixed, we can adjust the value of
to mitigate the policy bias caused by restoration speed.
Figure 11 illustrates the impact of different values of
on policy bias when
.
It can be observed from the
Figure 11 that as the value of
reduces, the occurrence number of
reduces regardless of the restoration speed. First, this indicates that when
, the value of
is greater than
, resulting in a
overestimation. To solve this, the value of
can be appropriately reduced with considering the occurrence time of DoS attacks. Second, the minimum occurrence number of
reduces by 60.26% to 80.12% compared to the maximum occurrence number for the five restoration speed by adjusting the value of
, demonstrating that adjusting the value of
effectively resolves the policy bias issue caused by restoration speed.
The average occurrence number of
for the five restoration speed under each
value in
Figure 11 is calculated to represent the actual policy bias under normal restoration speed. The results are shown in
Table 5, where the difference ratio indicates the percentage reduction in the average occurrence number of
for the next
value compared to the current
value. For instance, when the value of
changes from 1 to 0.9, the average occurrence number of
reduces by 11.29%, resulting in a difference ratio of 11.29%. We can observe that as the value of
reduces from 1, the difference ratio increases continuously. When the value of
changes from 0.4 to 0.3, the difference ratio reaches its maximum value of 24.38%, and as the value of
continues to reduce, the difference ratio also reduces. This indicates that under the condition of
, the optimal range of
value for minimizing the difference between
and
is around 0.3, leading to a more effective improvement in solving the problem of policy bias.
Based on the aforementioned analysis, it is evident that for uncertainties that cannot be directly controlled, such as the occurrence time of DoS attack and the restoration speed of communication delay,we can adjust the value of to mitigate the extent of restoration loss under various occurrence time of DoS attack. Additionally, we can improve the issue of policy bias caused by restoration speed by adjusting the value of . The experimental results demonstrate that the proposed algorithm exhibits strong adaptability and resilience in the presence of communication delay caused by DoS attack.