3.4. Particle Swarm Optimization (PSO)
PSO is used to fine-tune the control variables (e.g., voltage magnitudes, generator outputs) to minimize the objective function. In PSO, a swarm of particles moves through the search space, with each particle representing a potential solution. The position of each particle is updated based on its velocity, which is influenced by the best position it has found so far and the best position found by the entire swarm.
The update equations for PSO are:
1. Velocity Update:
where: -
is the velocity of particle
i at iteration
k, -
w is the inertia weight, -
are acceleration coefficients, -
are random numbers between 0 and 1, -
is the best position found by particle
i so far, -
is the best position found by the swarm.
2. Position Update:
where
is the position of particle
i at iteration
k.
The algorithm iterates through multiple steps, updating positions and velocities until convergence.
3.5. Deep Reinforcement Learning (DRL)
In DRL, the grid control actions are modeled as a sequential decision-making process, where the agent learns an optimal control policy through interactions with the environment. The environment provides feedback in the form of a reward based on the current state of the grid and the action taken by the agent.
The optimization problem in DRL is formulated as a Markov Decision Process (MDP) with the following components:
- State (s): The state of the grid, including bus voltages, power flows, and generator outputs.
- Action (a): The control actions, such as adjusting generation set-points or switching capacitor banks.
- Reward (r): The reward is defined based on the reduction of power losses and maintenance of voltage levels within desired limits. A typical reward function might be:
where: -
is the power loss at bus
i, -
is the reference voltage, -
is a weighting factor that balances loss reduction and voltage deviation.
- Policy (): The policy determines the optimal action to take given the current state, and it is updated using a DRL algorithm such as Proximal Policy Optimization (PPO).
3.6. Hybrid DRL-PSO Algorithm Integration
The hybrid DRL-PSO algorithm works by first using DRL to predict the optimal control actions based on the current grid state. These actions are then refined using PSO to further minimize the objective function (power loss) and ensure global optimization. The combined framework leverages the strengths of both methods: DRL’s ability to adapt in real-time and PSO’s capability to fine-tune the solution.
1. DRL Agent: The DRL agent uses a deep neural network to learn optimal power flow actions. The agent interacts with the smart grid environment, receiving state information (e.g., voltage levels, power flow) and taking actions to minimize losses. The reward function is designed to penalize high losses and voltage violations.
The state-action pair
at time step
t is updated according to the Bellman equation:
where
is the reward received at time
t and
is the discount factor.
2. PSO Optimization: PSO is used to fine-tune the control variables (e.g., generator set-points, capacitor switching) by optimizing the objective function
. Each particle in the swarm represents a potential solution, and particles adjust their positions based on their individual and swarm-wide best experiences:
where
is the velocity of particle
i,
is the position,
is the individual best position, and
is the global best position.
The objective function is to minimize
, while ensuring that the voltage at each bus is maintained within acceptable limits, typically:
where: -
is the voltage at bus
i, -
and
are the minimum and maximum voltage limits, respectively.
Figure 1 illustrate the Proposed DRL-PSO Optimization Framework described below as a flowchart description:
1. Start:
o The algorithm initializes the smart grid model and sets up DRL and PSO frameworks.
2. Environment Setup:
o Define the state of the grid (bus voltages, power flows, generator outputs).
o Set up constraints, including voltage limits, load demand, and power generation limits.
3. Deep Reinforcement Learning (DRL) Process:
o Step 1: State Input: The DRL agent observes the current grid state.
o Step 2: Action Selection: The DRL agent selects an action (e.g., adjusting voltage set-points or switching capacitors) based on the learned policy.
o Step 3: Reward Calculation: Compute the reward based on power loss reduction and voltage regulation.
o Step 4: Policy Update: The agent updates its policy using a DRL algorithm (such as PPO).
4. Check for Convergence:
o If not converged: Repeat the DRL process by refining the policy and control actions.
o If converged: Proceed to the PSO process for fine-tuning.
5. Particle Swarm Optimization (PSO) Process:
o Step 1: Initialize Swarm: Generate a swarm of particles representing potential solutions (control variables such as voltage levels or generator outputs).
o Step 2: Velocity and Position Update: Each particle’s velocity and position are updated based on its best-found solution and the global best.
o Step 3: Evaluate Objective Function: For each particle, calculate the power losses and ensure constraints (e.g., voltage limits) are satisfied.
o Step 4: Update Swarm Bests: Update the best positions and velocities of the particles.
o Step 5: Check for Convergence: If the swarm has converged, the final control solution is determined.
6. Final Solution:
o The optimal control actions are selected, minimizing power losses and maintaining voltage levels within desired limits.
7. End:
o The algorithm terminates with the optimized grid control actions.