Altmetrics
Downloads
105
Views
51
Comments
0
A peer-reviewed article of this preprint also exists.
This version is not peer-reviewed
Submitted:
01 October 2024
Posted:
04 October 2024
You are already at the latest version
RL | Reinforcement learning |
MDP | Markov decision process |
EM | Energy market |
GSAC | Grid stability and control |
BEM | Building energy management |
EV | Electric vehicle |
ESS | Energy storage system |
PV | Photovoltaic |
MG | Micro-grid |
DR | Demand-response |
ET | Energy trading |
AC | Actor-critic & varitations |
PG | Policy gradient & varitations |
QL | Q-learning & varitations |
Problem | States | Actions | Reward |
---|---|---|---|
EM | System operator decides upon a power flow distribution | Firms set their bid | The firms’ rewards are the net profit achieved |
GSAC | Voltage levels at different nodes | Adjusting the output of power generators | Cost of deviating from nominal voltage levels |
BEM | Indoor temperature and humidity levels | Adjusting thermostat setpoints for heating and cooling | Cost of electricity |
EV | Traffic conditions and route information | Selecting a route based on traffic and charging station availability | Cost of charging, considering electricity prices and charging station fees |
ESS | Battery state of charge and current consumer power demand | The controller decides how much power to produce using the generator | The power generation cost the controller must pay |
Ref. | Application | Algorithm | State-space | Policy | Dataset & simulator |
---|---|---|---|---|---|
[35] | ET | AC | Continuous | Deterministic | [43] |
[37] | ET | AC | Discrete | Stochastic | [44] |
[38] | ET | QL | Continuous | Deterministic | [45] |
[40] | ET | Other | Discrete | Stochastic | Simulated data |
[41] | ET | PG, Other | Continuous | Deterministic | Real data |
[36] | Dispatch | PG | Continuous | Deterministic | Simulated data |
[39] | Dispatch | AC, Other | Continuous | Deterministic | [46,47] |
[42] | DR, Microgrid | QL | Continuous | Deterministic | Real data, [48] |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[49] | Voltage control | Other | Continuous | Stochastic | IEEE 300, IEEE 9, [57] |
[50] | Voltage control | Other | Continuous | Stochastic | IEEE 300 |
[51] | Voltage control | AC | Continuous | Deterministic | IEEE 123 |
[52] | Voltage control | QL | Continuous | Stochastic | IEEE 39, [57] |
[53] | Microgrid | Other | Discrete | Deterministic | HIL platform “dSPACE MicroLabBox” |
[54] | Microgrid | PPO | Continuous | Stochastic | Empirical measurments |
[56] | Power flow, Microgrid | AC | Continuous | Stochastic | [58,59] |
[55] | Power flow | PG | Continuous | Stochastic | IEEE 118 |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[60] | HVAC | AC | Discrete | Deterministic | [67] |
[61] | HVAC | AC | Continuous | Stochastic | Real data [Upon request] |
[63] | HVAC | Other | Discrete | Deterministic | Simulated data |
[65] | HVAC | Other | Discrete | Deterministic | Simulated data |
[64] | HVAC | QL, Other | Discrete | Deterministic | [68] |
[62] | HVAC | PPO | Discrete | Stochastic | “EnergyPlus” |
[66] | Dispatch | QL, Other | Discrete | Deterministic | [69], Simulated data |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[70] | Power flow | QL | Discrete | Deterministic | MATLAB simulation |
[71] | Charge control | QL | Mixed | Deterministic | Historic prices |
[72] | Charge control | PG | Continuous | Deterministic | Simulated |
[73] | Charge control | AC | Continuous | Deterministic | Simulated |
[74] | Charge control | QL | Discrete | Deterministic | Open street map, ChargeBar |
[75] | Charge control | QL | Discrete | Deterministic | Simulated |
[77] | Charge scheduling | AC | Continuous | Deterministic | Simulated |
[78] | Charge control | AC | Continuous | Stochastic | Historic prices |
[76] | Load balancing | QL | Continuous | Deterministic | Simulated |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[87] | Smart Grid | QL | Discrete | Stochastic | Simulated data |
[79] | Smart Grid | Other | Discrete | Stochastic | Simulated data |
[85] | Smart Grid | Other | Discrete | Stochastic | [88,89] |
[81] | EV | QL | Discrete | Stochastic | Simulated data |
[90] | EV | QL, Other | Discrete | Deterministic | Simulated data |
[82] | EV | QL | Continuous | Deterministic | Simulated data |
[83] | EV | QL, Other | Discrete | Stochastic | Simulated data |
[80] | Renewable energy | Other [91] | Discrete | Stochastic | Simulated data |
[84] | Battery ESS, frequency support | PG, AC | Continuous | Deterministic | Simulated data |
[86] | Energy system modeling | Other | Discrete | Stochastic | Simulated data |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[92] | ET | PG | Continuous | Deterministic | Real data |
[93] | ET | QL | Discrete | Deterministic | [102,103] |
[95] | ET | PG | Continuous | Deterministic | [104] |
[96] | ET | AC | Continuous | Stochastic | Simulated data |
[97] | ET | QL | Continuous | Deterministic | Real data |
[99] | Microgrid, Dispatch | QL | Continuous | Stochastic | [105] |
[100] | Microgrid | PG | Continuous | Stochastic | Simulated data |
[101] | Microgrid | Other | Continuous | Deterministic | Real and Simulated data |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[106] | Voltage control | PG | Continuous | Stochastic | Powerflow & Short circuit Assessment Tool (PSAT), 200-bus system [115] |
[107] | Voltage control | AC | Continuous | Stochastic | IEEE 33-, 123-, and 342-node systems |
[108] | Voltage control | QL | Discrete | Stochastic | IEEE 14-bus system |
[109] | Frequency control | QL | Discrete | Deterministic | Simulted data |
[110] | Frequency control | PG | Discrete | Deterministic | Kundur’s 4-unit-13 bus system, New England 68-bus system, [116] |
[111] | Microgrid | QL | Continuous | Stochastic | 7-bus system and the IEEE 123-bus system |
[112] | Microgrid | Other | Discrete | Deterministic | Simulated data |
[113] | Power flow | PPO | Continuous | Stochastic | Illinois 200-bus system |
[114] | Power flow | PPO | Continuous | Stochastic | Simulated data, West Denmark wind data |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[117] | HVAC | QL | Discrete | Deterministic | Simulated data |
[125] | HVAC | PPO | Continuous | Deterministic | “EnergyPlus” |
[121] | HVAC | QL | Continuous | Stochastic | “EnergyPlus” |
[123] | HVAC | QL | Discrete | Deterministic | Simulated data |
[124] | HVAC | PG | Continuous | Stochastic | [128,129] |
[122] | HVAC | AC | Continuous | Stochastic | [130] |
[118] | HVAC,DR | PPO | Continuous | Deterministic | “EnergyPlus” |
[120] | HVAC, DR | QL, PG | Continuous | Stochastic | [130] |
[119] | DR | QL, PG | Discrete | Deterministic | [131] |
[126] | Dispatch | PG | Continuous | Deterministic | Simulated data |
[127] | Dispatch | QL | Continuous | Stochastic | [47,132] |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[133] | Scheduling | QL | Discrete | Deterministic | “Open street map”, “ChargeBar” |
[134] | Scheduling | QL | Continuous | Deterministic | Simulated |
[135] | Scheduling | Other | Continuous | Deterministic | Historic data |
[136] | Scheduling | Other | Continuous | Deterministic | Simulated |
[138] | Scheduling | QL | Discrete | Deterministic | “ElaadNL” |
[137] | Cost reduction | Other | Mixed | Deterministic | Simulated |
[139] | Cost reduction | QL | Continuous | Deterministic | Simulated |
[142] | Cost reduction | QL | Discrete | Deterministic | Simulated |
[141] | DR | QL | Discrete | Deterministic | Simulated |
[140] | SoC control | QL, PG | Continuous | Deterministic | Historic data |
Ref. | Application | Algorithm | State-space | Policy | Dataset |
---|---|---|---|---|---|
[145] | Microgrids | QL | Continuous | Deterministic | Simulated data |
[146] | Microgrids | QL | Discrete | Deterministic | Simulated data |
[149] | Microgrids | QL | Continuous | Deterministic | Simulated data |
[150] | Microgrids | QL | Continuous | Deterministic | [105] |
[148] | Frequency control | Other | Continuous | Deterministic | Simulated data |
[143] | Frequency control | PG, AC | Continuous | Deterministic | Simulated data |
[144] | Energy trading | QL | Continuous | Deterministic | [152] |
[147] | Energy trading | QL | Continuous | Deterministic | Simulated data |
[151] | EV | QL | Continuous | Stochastic | Simulated data |
RL expressions | Power systems application expressions |
---|---|
“model-based” | “energy market management” |
OR | OR |
“model learning” | “voltage control” |
OR | OR |
“model-free” | “frequency control” |
OR | OR |
“data-driven” | “reactive power control” |
AND/OR | OR |
“reinforcement learning” | “grid stability” |
OR | |
“microgrid” | |
OR | |
“building energy management” | |
OR | |
“building” | |
OR | |
“electrical vehicles” | |
OR | |
“EV” | |
OR | |
“energy storage control problems” | |
OR | |
“battery energy storage system” | |
OR | |
“local energy trading” |
QL | PG | AC | PPO | Other | ||||||
---|---|---|---|---|---|---|---|---|---|---|
MB | MF | MB | MF | MB | MF | MB | MF | MB | MF | |
ESS | 5 | 7 | 1 | 1 | 0 | 1 | 0 | 0 | 6 | 1 |
EV | 5 | 7 | 1 | 1 | 3 | 1 | 0 | 0 | 0 | 2 |
BEM | 2 | 6 | 0 | 4 | 2 | 1 | 1 | 2 | 3 | 0 |
GSAC | 1 | 3 | 1 | 2 | 2 | 1 | 1 | 2 | 3 | 1 |
EM | 2 | 3 | 2 | 3 | 3 | 1 | 0 | 0 | 3 | 2 |
Category | Challenges |
---|---|
Lack of standardization | Lack of real-world data for different control tasks in power systems. No qualitative simulator to efficiently integrate between accurate physical models of energy systems and reinforcement learning libraries. No standardized benchmarks algorithms or datasets that represent a quality norm for various reinforcement learning algorithms. |
Lack of generalization | Lack of data causes limited generalization ability in model-free algorithms. Complex models in power systems are difficult to learn, thus model-based algorithms converge to inaccurate model, which does not generalize well. As the state or action variables increase, there is an exponential growth in the computational requirements of the model. |
Limited safety | Model-free methods produce suboptimal policy due to small acquired data, which may not perform well when unexpected events occur. The complexity of the environment’s dynamics causes model-based algorithms to produce suboptimal policies, jeopardizing the stability of the system when uncertainty is encountered. During training the models focus on exploration and perform mainly random actions, in real-time applications for power systems it may be catastrophic and lead to blackouts. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated