Elsevier

Applied Energy

Volume 285, 1 March 2021, 116386
Applied Energy

Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system

https://doi.org/10.1016/j.apenergy.2020.116386Get rights and content

Highlights

  • A new AGC dispatch model in the integrated energy system is introduced.

  • The MEPR-TD3 algorithm is proposed to handle AGC dispatch.

  • The proposed method can be used in a performance-based frequency regulation market.

  • The method achieves comprehensive optimum in control performance and economy profit.

Abstract

To balance the stochastic power disturbance in integrated energy system (IES), a novel automatic generation control (AGC) dispatch is proposed by taking account of the regulation rule that applies to a performance-based frequency regulation market, with the aim to reduce area control deviation and regulation mileage payment while complying with constraints of various regulation units. Thus, a multiple experience pool replay twin delayed deep deterministic policy gradient (MEPR-TD3) is put forward to improve the training efficiency and the action quality via four improvements including the multiple experience pool probability replay strategy. Finally, the performance of the proposed algorithm is verified on an extended two-area load frequency control (LFC) model and Hainan province IES with different demand of multiple energy.

Introduction

Commonly applied to achieve the secondary frequency control of interconnected power grids, AGC is capable to maintain frequency and tie-line exchange power of the system at the rated values [1]. In general, there are two major control processes involved in AGC [2]. First of all, in case of the real-time deviation inputs of frequency and tie-line exchange power, the unexpected power disturbances are approximated on a real-time basis by a controller, such as the PI controller. Then, the regulation command will be dispatched to each unit.

As for the traditional thermal power or hydropower units, they are commonly used as AGC units [2]. Due to its long response delay and low generation rate, however, it is difficult to track the AGC power regulation command swiftly [3]. Consequently, it is likely for interconnected power grids to encounter such problems as poor control performance, unqualified CPS indicator and so on, especially for those control areas with a high renewable energy penetration rate but no adequate resource regulation [4].

In order to make the fast response units more actively participate in the secondary frequency regulation, the Federal Energy Regulatory Commission (FERC) released the number 755 order in 2011, and the performance-based frequency regulation market mechanism (also referred to as the frequency regulation market) was put forward for the frequency regulation participators in this order [5]. Guided by this mechanism, there are two parts that comprise the payment for every AGC unit [6]. Specifically, the two parts refer to the capacity payment and the regulation performance/mileage payment, both of which are straight impacted by the regulation performance. It enables AGC units with high mileage payment to participate in AGC dispatch more actively, thereby obtaining better control performance. Like the power grid, IES is a new integrated energy system containing various regulation units. Based on the new mechanism, the most important task is how to effectively dispatch the regulation command to each unit in IES.

Some independent system operators (ISO), including PJM [7] and China Southern Power Grid (CSG) [8], distribute frequency regulation resources based on the practical performance of diverse units. There is a novel real-time optimal dispatch on the base of mileage which is presented in reference [9] and it improves the Genco’s profit. Also, a real-time AGC dispatch is utilised which enables AGC units with a fast-ramping to distribute a larger AGC power regulation command under high-frequency regulation mileage requirements in reference [10]. These methods are simple and suitable but lack specific optimisations that make them incapable of meeting the requirements of ISO for the comprehensive benefits of control performance and regulation mileage payments. In fact, ISO aims to not only balance load disturbance as quickly as possible, but also minimise regulation mileage payments. However, these two objectives are incompatible, as in the frequency regulation market, the AGC unit with a higher ramp rate under the same AGC power regulation command will have to cover a higher regulation mileage payment [5]. To address the problems cited above, an integrated energy system AGC (IES-AGC) dispatch with various regulation units is proposed, taking the AGC control performance and regulation mileage payment into account. By controlling different regulation units, the best control performance can be obtained.

Taking the dynamic response procedure of every unit into consideration, the proposed IES-AGC is a non-smoothed nonlinear programming [11]. The traditional AGC dispatch includes genetic algorithm (GA), quadratic, grey wolf optimiser (GWO) [12], [13], [14], proportional (PROP) [9], particle swarm optimisation (PSO) [15], moth-flame optimisation (MFO) [16], whale optimisation algorithm (WOA) [17], ant lion optimiser (ALO) [18], dragonfly algorithm (DA) [19], group search optimiser (GSO) [20], chicken swarm optimisation (CSO) [21], sine cosine algorithm (SCA) [22] etc. However, the calculation speed of the above method is slow, and the actual calculation time, due to the improved accuracy of the algorithm, will exceed the maximum time allowed by the generation order [23]. Nowadays, a diversity of machine learning algorithms involving extensible deep learning [13], multi-objective reinforcement learning [24], decision tree, neural network and clustering technology [25], for reasons of their perception and decision-making abilities, have been applied to AGC dispatch. In reference [26], the author proposed a three-network double-delay actor-critic (TDAC) control method, which has improved the system control performance. In reference [27], an AGC dispatch framework based on hierarchical Q-learning has been proposed, thus improving the control performance of micro-grids. In short, the above methods can deal with the randomness of the load and improve the control performance. However, the number of objects that can be controlled with these methods is small, with slow convergence of the algorithm.

Compared with algorithms mentioned above, the deep deterministic policy gradient (DDPG) in deep reinforcement learning has better real-time and adaptability and can output continuous actions as well as realise the continuous regulation of AGC units. At present, the DDPG has not been applied to the field of AGC dispatch.

Based on the above, an IES-AGC dispatch model based on MEPR-TD3 is proposed in the frequency regulation market. The experience pools in the MEPR-TD3 are classified by the multi-experience pool probability replay strategy. Different probability samples are collected from different experience pools for training purposes. The training efficiency and optimisation accuracy of the agents are improved, which can be applied to multiple control objects, thus improving the control performance of IES-AGC dispatch.

The innovations of this are set forth in the following points:

  • (1)

    Previous research on AGC dispatch failed to meet the IES requirements of comprehensive optimisation of frequency regulation performance and economy. Especially in the frequency regulation market, an ideal result has not been produced in the research on dual-objective optimisation AGC dispatch. To solve this problem, IES-AGC dispatch is proposed, which can realise the comprehensive benefit of control performance as well as economic benefits. Compared to the original AGC dispatch framework, the IES-AGC dispatch with more optimisation space can deploy more units, thus making it easier to obtain the optimal solution and optimal strategy.

  • (2)

    The improvements in MEPR-TD3 include delayed policy updating, target policy smoothing, two network critic and multiple experience pool replay strategy. The multiple experience pool replay strategy makes the IES-AGC dispatch strategy more effective; it selects a more important empirical sample with a greater probability for training and a less important empirical sample with a smaller probability for training. As a result, MEPR-TD3 provides the advantages of fast convergence and realising continuous IES-AGC dispatch for multiple objects without tending to achieve the local optimums.

The major content of this paper includes: the model of IES-AGC dispatch in light of the frequency regulation market is demonstrated in Section 2, followed by the discussion on the multiple experience pool replay twin delayed deep deterministic policy gradient algorithm in Section 3. Section 4 deals with the AGC system design based on the MEPR-TD3. In Section 5, the case studies results are analysed and discussed, ending with the conclusion in Section 6.

Section snippets

IES

The IES uses advanced energy management technology within a certain area and helps coordinate planning, optimized operation, collaborative management, interactive response, mutual complement and assistance by combing various resources within the area, such as coal, oil, gas, electrical power and thermal power. On the basis of large-scale distribution energies interconnection, a combination production system of thermal power and electrical power is integrated to realize complimentary,

Reinforcement learning

The goal of reinforcement learning is to seek a best policy to maximize the expected return value [34]. In the actor-critic framework, the actor updates the network through the deterministic policy gradient (DPG).ϕJ(ϕ)=Es-pτaQπ(s,a)a=π(s)ϕπϕ(s)where Qπ(s,a) is the critic function, which represents the expected return value with state s and action a. It uses the algorithm's advantage of action in continuous space and turns random policy into deterministic policy as follows:at~πθst|θπwhere at

Design of action and state space

The IES-AGC dispatch system studied can be used as the dynamic random environment. The IES-AGC dispatch based on MEPR-TD3 calculates the corresponding reward based on the system state and uses the current system environment state quantity as well as reward as the input for the MEPR-TD3 IES-AGC dispatch. The strategy performs online learning, gives the optimal dispatch signal and outputs a set of continuous actions. The actions refer to the participation factors distributed to n-1 units. As to

Case studies

In an attempt to prove the availability of the MEPR-TD3 IES-AGC dispatch (hereinafter referred to as MEPR-TD3), the engineering method for AGC dispatch: PROP [29], the DDPG, and the TD3 are used as the comparisons to compare and calculate for simulation test on a two-area LFC control model as well as a certain provincial IES. There are three cases as below.

Conclusion

  • 1)

    Based on deep reinforcement learning, IES-AGC dispatch is proposed to balance stochastic power disturbance of IES. The IES-AGC dispatch with more optimization space can deploy more units, thus making it easier to get the optimal solution, and then obtain the optimal strategy. In the proposed framework, four improvements are applied to MEPR-TD3. Especially the strategy of multiple experience pool probability replay is particularly effective. Experience pools are classified based on the strategy

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was jointly supported by National Natural Science Foundation of China (51777078, U2066212, 51907112).

References (40)

  • F.A. Darvish

    Robust and intelligent type-2 fuzzy Fractional-Order Controller-Based automatic generation control to enhance the damping performance of Multi-Machine power systems

    Iete J Res

    (2020)
  • F.A. Darvish

    Optimal fractional order BELBIC to ameliorate small signal stability of interconnected hybrid power system

    Environ Prog Sustain

    (2019)
  • F.A. Darvish

    An innovative OANF–IPFC based on MOGWO to enhance participation of DFIG-based wind turbine in interconnected reconstructed power system

    Soft Comput

    (2019)
  • A.D. Falehi et al.

    Neoteric HANFISC-SSSC based on MOPSO technique aimed at oscillation suppression of interconnected multi-source power systems

    IET Gener Transm Distrib

    (2016)
  • U.S. Federal Energy Regulatory Commission, Washington, DC, USA, FERC 755, Dockets RM11-7-000 AD10-11-000, Oct....
  • X. Zhang et al.

    Adaptive distributed auction-based algorithm for optimal mileage based AGC dispatch with high participation of renewable energy

    Int. J. Electr. Power Energy Syst.

    (2020)
  • PJM, Docket No. ER12-1204-001, Mar. 5, 2012 [Online]. Available:...
  • Y. Xichang et al.

    Practical implementation of the SCADA+AGC/EDC system of the Hunan power pool in the central China power network

    IEEE Trans Energy Convers

    (1994)
  • X. Zhang et al.

    Optimal Mileage Based AGC Dispatch of a GenCo

    IEEE Trans. Power Syst.

    (2020)
  • X. Zhang et al.

    Lifelong learning for complementary generation control of interconnected power grids with high-penetration renewables and EVs

    IEEE Trans Power Syst

    (2018)
  • Cited by (0)

    View full text