Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system
Introduction
Commonly applied to achieve the secondary frequency control of interconnected power grids, AGC is capable to maintain frequency and tie-line exchange power of the system at the rated values [1]. In general, there are two major control processes involved in AGC [2]. First of all, in case of the real-time deviation inputs of frequency and tie-line exchange power, the unexpected power disturbances are approximated on a real-time basis by a controller, such as the PI controller. Then, the regulation command will be dispatched to each unit.
As for the traditional thermal power or hydropower units, they are commonly used as AGC units [2]. Due to its long response delay and low generation rate, however, it is difficult to track the AGC power regulation command swiftly [3]. Consequently, it is likely for interconnected power grids to encounter such problems as poor control performance, unqualified CPS indicator and so on, especially for those control areas with a high renewable energy penetration rate but no adequate resource regulation [4].
In order to make the fast response units more actively participate in the secondary frequency regulation, the Federal Energy Regulatory Commission (FERC) released the number 755 order in 2011, and the performance-based frequency regulation market mechanism (also referred to as the frequency regulation market) was put forward for the frequency regulation participators in this order [5]. Guided by this mechanism, there are two parts that comprise the payment for every AGC unit [6]. Specifically, the two parts refer to the capacity payment and the regulation performance/mileage payment, both of which are straight impacted by the regulation performance. It enables AGC units with high mileage payment to participate in AGC dispatch more actively, thereby obtaining better control performance. Like the power grid, IES is a new integrated energy system containing various regulation units. Based on the new mechanism, the most important task is how to effectively dispatch the regulation command to each unit in IES.
Some independent system operators (ISO), including PJM [7] and China Southern Power Grid (CSG) [8], distribute frequency regulation resources based on the practical performance of diverse units. There is a novel real-time optimal dispatch on the base of mileage which is presented in reference [9] and it improves the Genco’s profit. Also, a real-time AGC dispatch is utilised which enables AGC units with a fast-ramping to distribute a larger AGC power regulation command under high-frequency regulation mileage requirements in reference [10]. These methods are simple and suitable but lack specific optimisations that make them incapable of meeting the requirements of ISO for the comprehensive benefits of control performance and regulation mileage payments. In fact, ISO aims to not only balance load disturbance as quickly as possible, but also minimise regulation mileage payments. However, these two objectives are incompatible, as in the frequency regulation market, the AGC unit with a higher ramp rate under the same AGC power regulation command will have to cover a higher regulation mileage payment [5]. To address the problems cited above, an integrated energy system AGC (IES-AGC) dispatch with various regulation units is proposed, taking the AGC control performance and regulation mileage payment into account. By controlling different regulation units, the best control performance can be obtained.
Taking the dynamic response procedure of every unit into consideration, the proposed IES-AGC is a non-smoothed nonlinear programming [11]. The traditional AGC dispatch includes genetic algorithm (GA), quadratic, grey wolf optimiser (GWO) [12], [13], [14], proportional (PROP) [9], particle swarm optimisation (PSO) [15], moth-flame optimisation (MFO) [16], whale optimisation algorithm (WOA) [17], ant lion optimiser (ALO) [18], dragonfly algorithm (DA) [19], group search optimiser (GSO) [20], chicken swarm optimisation (CSO) [21], sine cosine algorithm (SCA) [22] etc. However, the calculation speed of the above method is slow, and the actual calculation time, due to the improved accuracy of the algorithm, will exceed the maximum time allowed by the generation order [23]. Nowadays, a diversity of machine learning algorithms involving extensible deep learning [13], multi-objective reinforcement learning [24], decision tree, neural network and clustering technology [25], for reasons of their perception and decision-making abilities, have been applied to AGC dispatch. In reference [26], the author proposed a three-network double-delay actor-critic (TDAC) control method, which has improved the system control performance. In reference [27], an AGC dispatch framework based on hierarchical Q-learning has been proposed, thus improving the control performance of micro-grids. In short, the above methods can deal with the randomness of the load and improve the control performance. However, the number of objects that can be controlled with these methods is small, with slow convergence of the algorithm.
Compared with algorithms mentioned above, the deep deterministic policy gradient (DDPG) in deep reinforcement learning has better real-time and adaptability and can output continuous actions as well as realise the continuous regulation of AGC units. At present, the DDPG has not been applied to the field of AGC dispatch.
Based on the above, an IES-AGC dispatch model based on MEPR-TD3 is proposed in the frequency regulation market. The experience pools in the MEPR-TD3 are classified by the multi-experience pool probability replay strategy. Different probability samples are collected from different experience pools for training purposes. The training efficiency and optimisation accuracy of the agents are improved, which can be applied to multiple control objects, thus improving the control performance of IES-AGC dispatch.
The innovations of this are set forth in the following points:
- (1)
Previous research on AGC dispatch failed to meet the IES requirements of comprehensive optimisation of frequency regulation performance and economy. Especially in the frequency regulation market, an ideal result has not been produced in the research on dual-objective optimisation AGC dispatch. To solve this problem, IES-AGC dispatch is proposed, which can realise the comprehensive benefit of control performance as well as economic benefits. Compared to the original AGC dispatch framework, the IES-AGC dispatch with more optimisation space can deploy more units, thus making it easier to obtain the optimal solution and optimal strategy.
- (2)
The improvements in MEPR-TD3 include delayed policy updating, target policy smoothing, two network critic and multiple experience pool replay strategy. The multiple experience pool replay strategy makes the IES-AGC dispatch strategy more effective; it selects a more important empirical sample with a greater probability for training and a less important empirical sample with a smaller probability for training. As a result, MEPR-TD3 provides the advantages of fast convergence and realising continuous IES-AGC dispatch for multiple objects without tending to achieve the local optimums.
The major content of this paper includes: the model of IES-AGC dispatch in light of the frequency regulation market is demonstrated in Section 2, followed by the discussion on the multiple experience pool replay twin delayed deep deterministic policy gradient algorithm in Section 3. Section 4 deals with the AGC system design based on the MEPR-TD3. In Section 5, the case studies results are analysed and discussed, ending with the conclusion in Section 6.
Section snippets
IES
The IES uses advanced energy management technology within a certain area and helps coordinate planning, optimized operation, collaborative management, interactive response, mutual complement and assistance by combing various resources within the area, such as coal, oil, gas, electrical power and thermal power. On the basis of large-scale distribution energies interconnection, a combination production system of thermal power and electrical power is integrated to realize complimentary,
Reinforcement learning
The goal of reinforcement learning is to seek a best policy to maximize the expected return value [34]. In the actor-critic framework, the actor updates the network through the deterministic policy gradient (DPG).where Qπ(s,a) is the critic function, which represents the expected return value with state s and action a. It uses the algorithm's advantage of action in continuous space and turns random policy into deterministic policy as follows:where at
Design of action and state space
The IES-AGC dispatch system studied can be used as the dynamic random environment. The IES-AGC dispatch based on MEPR-TD3 calculates the corresponding reward based on the system state and uses the current system environment state quantity as well as reward as the input for the MEPR-TD3 IES-AGC dispatch. The strategy performs online learning, gives the optimal dispatch signal and outputs a set of continuous actions. The actions refer to the participation factors distributed to n-1 units. As to
Case studies
In an attempt to prove the availability of the MEPR-TD3 IES-AGC dispatch (hereinafter referred to as MEPR-TD3), the engineering method for AGC dispatch: PROP [29], the DDPG, and the TD3 are used as the comparisons to compare and calculate for simulation test on a two-area LFC control model as well as a certain provincial IES. There are three cases as below.
Conclusion
- 1)
Based on deep reinforcement learning, IES-AGC dispatch is proposed to balance stochastic power disturbance of IES. The IES-AGC dispatch with more optimization space can deploy more units, thus making it easier to get the optimal solution, and then obtain the optimal strategy. In the proposed framework, four improvements are applied to MEPR-TD3. Especially the strategy of multiple experience pool probability replay is particularly effective. Experience pools are classified based on the strategy
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was jointly supported by National Natural Science Foundation of China (51777078, U2066212, 51907112).
References (40)
- et al.
Design of performance-based frequency regulation market and its implementations in real-time operation
Int J Electr Power Energy Syst
(2017) - et al.
Grey Wolf Optimizer
Adv Eng Softw
(2014) - et al.
Wolf pack hunting strategy for automatic generation control of an islanding smart distribution network
Energ. Convers. Manage.
(2016) - et al.
Short term electric load forecasting by wavelet transform and grey model improved by PSO (particle swarm optimization) algorithm
Energy
(2014) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm
Knowl-Based Syst
(2015)The ant lion optimizer
Adv Eng Softw
(2015)SCA: a sine cosine algorithm for solving optimization problems
Knowl-Based Syst
(2016)- et al.
Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel
Energy
(2018) - et al.
Greedy search based data-driven algorithm of centralized thermoelectric generation system under non-uniform temperature distribution
Appl Energy
(2020) - et al.
Robust sliding-mode control of wind energy conversion systems for optimal power extraction via nonlinear perturbation observers
Appl Energy
(2018)