Real-time energy purchase optimization for a storage-integrated photovoltaic system by deep reinforcement learning

https://doi.org/10.1016/j.conengprac.2020.104598Get rights and content

Highlights

  • Real-time energy purchases for a storage-integrated PV system are optimized.

  • Non-linearities and stochastic data are approached with deep reinforcement learning.

  • Deep reinforcement learning uses Q-learning combined with a deep neural network.

  • Charge control strategies arise from optimized grid energy purchases.

  • Simulation results performed on data from polish system confirm effectiveness.

Abstract

The objective of this article is to minimize the cost of energy purchased on a real-time basis for a storage-integrated photovoltaic (PV) system installed in a microgrid. Under non-linear storage charging/discharging characteristics, as well as uncertain solar energy generation, demands, and market prices, it is a complex task. It requires a proper level of tradeoff between storing too much and too little energy in the battery: future excess PV energy is lost in the former case, and demand is exposed to future high electricity prices in the latter case. We propose a reinforcement learning approach to deal with a non-stationary environment and non-linear storage characteristics. To make this approach applicable, a novel formulation of the decision problem is presented, which focuses on the optimization of grid energy purchases rather than on direct storage control. This limits the complexity of the state and action space, making it possible to achieve satisfactory learning speed and avoid stability issues. Then the Q-learning algorithm combined with a dense deep neural network for function representation is used to learn an optimal decision policy. The algorithm incorporates enhancements that were found to improve learning speed and stability by prior work, such as experience replay, target network, and increasing discount factor. Extensive simulation results performed on real data confirm that our approach is effective and outperforms rule-based heuristics.

Introduction

We are on the verge of dramatic technological and cultural changes, caused by the shift from coal to renewable energy sources. With the development of smart grids, new control methods are required to support power generation and storage so as to meet energy demand. However, control decision-making in smart grids is more difficult comparing with traditional power systems because the operation of a high-renewables system is associated with more uncertainty (Zhang et al., 2018). One way to mitigate the unpredictability of renewables is the application of energy storage technologies in locally isolated microgrid areas. Still, the optimal operation of storage-integrated systems remains a challenge, taking into account non-linear storage charging/discharging characteristics, and uncertain conditions (Chauhan & Saini, 2014).

Even though a storage system can help manage the stochastic behavior of renewables, there may be insufficient availability of renewable energy for quite long periods, like during cloudy or winter days in case of solar photovoltaic (PV) systems. Therefore, a microgrid system may not operate in stand-alone mode, but it must be supported by a grid energy supply to fulfill the critical load with conventional power. Real-time electricity prices are most likely to be applied in such a case, as argued by many economists (Dufo-López, 2015). The objective of optimal storage operation would be to minimize the cost of energy purchased from the grid. The energy management processes under consideration are nonlinear, stochastic, and multi-period. The overall cost feedback is delayed relative to individual purchase decisions, because buying too little now may force additional purchases later on, and buying too much now may prevent making savings later on when prices drop. This creates challenges that can be hardly addressed with simple ad hoc control strategies (Iovine et al., 2019).

The hybrid photovoltaic-electrical energy storage technology is the most popular installation in leading markets, as reviewed by Liu et al. (2019). Optimization of hybrid PV-storage systems has been extensively investigated to improve system performance (Arani et al., 2019). Ampatzis et al. (2017) explain how such systems can be used in a cluster managed by an aggregator to obtain demand response.

Numerous methods have been proposed to determine cost- minimizing real-time control of an integrated PV-storage system, based on forecasting of load, renewables, and prices. These methods fall into categories of dynamic programming (Li & Danzer, 2014), convex optimization (Wang et al., 2015), stochastic optimization (Conte et al., 2017), and the optimization of Lagrange multipliers (Nge et al., 2019). To reduce dependency on forecasting accuracy some researchers proposed fuzzy logic controllers (Teo et al., 2018), or heuristic search methods, such as particle swarm optimization (Stoppato et al., 2014), refined by a genetic algorithm (Phan et al., 2018).

Reinforcement learning is an attractive paradigm for addressing stochastic optimal control problems. This approach, based on dynamic interactions and evaluative feedback, does not require forecasting models to be available. Some domain knowledge is usually still required, though, to properly design the learning control system, including its input and output representation as well as training information (Glavic et al., 2017).

There is only a handful of articles that use machine learning approaches to optimize energy storage control. One of the first studies employing adaptive dynamic programming, an approach closely related to reinforcement learning, is that of Wei et al. (2014), subsequently extended to account for PV generation (Wei et al., 2017). They determine optimal battery charging/discharging/idle control law, which minimizes the total expense of the power from the grid under the assumptions of periodic residential load and electricity rate. The dynamic pricing demand response problem that takes into consideration the uncertainty of load demand was solved using reinforcement learning by Lu et al. (2018). Recently (Henri & Lu, 2019) used a supervised machine learning approach to control several different energy storage devices.

Deep reinforcement learning, which is a combination of deep learning and reinforcement learning, is recently gaining a lot of attention. However, its applications in power system and smart grids can be scarcely found (Zhang et al., 2018). An application of deep Q-learning to real-time scheduling of energy consuming resources was presented by Zhang et al. (2017). Lee and Choi (2019) apply the Q-learning algorithm to schedule energy consumption of home appliances, including energy storage system. To the best of our knowledge, no study has applied deep reinforcement learning to energy control and cost minimization for a complex, stochastic system, as considered in this article.

The core contribution of this work is the development of the architecture of a control system with an intelligent decision-making module using an advanced model-free learning algorithm. More specifically, the main achievements presented in the article are listed below.

  • The application of deep reinforcement learning as a control method for a complex system with nonlinear, stochastic, and multi-period processes.

  • A novel formulation of the energy storage control problem as making purchase decisions, which makes it possible to limit the size of the action space to a relatively small number of discrete actions and reduce the dimensionality of the state space.

  • Successful demonstrations in realistic simulation experiments, in which the proposed algorithm, with appropriately tuned parameters, exhibits good learning speed and stability, and achieves better control quality than human-designed heuristic rules in several different environment configurations.

The remainder of this article is organized as follows. The adopted problem formulation is presented in Section 2. Section 3 describes the proposed reinforcement learning approach to optimizing energy purchase decision. The results of a realistic experimental case study are presented in Section 4. Section 5 summarizes the main findings of this work and outlines some promising continuation directions.

Section snippets

Problem formulation

The results presented in this article are based on a realistic simulation of a storage-integrated solar power system. The assumed system architecture matches standard microgrid infrastructure, and real insolation, energy consumption, energy price data are used to perform the simulation.

Solution method

In the paradigm of reinforcement learning a learning agent learns to perform its task from interactions with its environment. At each time step it observes the current state of the environment and performs an action. Then it receives a reinforcement value, also called a reward, and a state transition takes place. State transitions and reinforcement values may be, in general, stochastic, and the agent does not know either their underlying distributions or expected values. The objective of

Case study

To verify the effectiveness and performance of the proposed Q-learning approach, several test cases and comparisons were studied.

Conclusions

The combination of reinforcement learning with deep neural networks has become one of the most promising areas in the field of artificial intelligence. Successful applications to practical problems with real-time control can be still hardly found, though. This article managed to overcome difficulties anticipated by the theory and observed in prior experimental work, such as insufficient learning speed and poor stability issues. We believe the achieved successful performance can be primarily

CRediT authorship contribution statement

Waldemar Kolodziejczyk: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Izabela Zoltowska: Supervision, Research methodology, Writing - original draft, Writing - review & editing. Pawel Cichosz: Supervision, Research methodology, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (53)

  • StoppatoA. et al.

    A PSO (particle swarm optimization)-based model for the optimal management of a small PV (photovoltaic)-pump hydro energy storage in a rural dry area

    Energy

    (2014)
  • UrtasunA. et al.

    State-of-Charge-based droop control for stand-alone AC supply systems with distributed energy storage

    Energy Conversion and Management

    (2015)
  • AmpatzisM. et al.

    Robust optimisation for deciding on real-time flexibility of storage-integrated photovoltaic units controlled by intelligent software agents

    IET Renewable Power Generation

    (2017)
  • AndersonC.W.

    Strategy learning with multilayer connectionist representations

  • AraniA.K. et al.

    Review on energy storage systems control methods in microgrids

    International Journal of Electrical Power & Energy Systems

    (2019)
  • ArulkumaranK. et al.

    Deep reinforcement learning: A brief survey

    IEEE Signal Processing Magazine

    (2017)
  • BairdL.C.

    Residual algorithms: Reinforcement learning with function approximation

  • BartoA.G. et al.

    On the computational economics of reinforcement learning

  • BartoA.G. et al.

    Neuronlike adaptive elements that can solve difficult learning control problems

    IEEE Transactions on Systems, Man, and Cybernetics

    (1983)
  • BellmanR.E.

    Dynamic programming

    (1957)
  • BoyanJ. et al.

    Generalization in reinforcement learning: Safely approximating the value function

  • CaelenO. et al.

    Improving the exploration strategy in bandit algorithms

  • CichoszP.

    Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning

    Journal of Artificial Intelligence Research

    (1995)
  • CichoszP.

    An analysis of experience replay in temporal difference learning

    Cybernetics and Systems

    (1999)
  • CritesR.H. et al.

    Improving elevator performance using reinforcement learning

  • François-LavetV. et al.

    How to discount deep reinforcement learning: Towards new dynamic strategies

    (2015)
  • Cited by (31)

    • Control frameworks for transactive energy storage services in energy communities

      2023, Control Engineering Practice
      Citation Excerpt :

      Fostered by the decreasing cost of storage technologies and emerging mechanisms of energy exchange and sharing, a viable solution to attain self-consumption of on-site production is represented by the use of energy storage systems (ESSs) that are valuable resources of the community at the local level (Bartolini, Carducci, Muñoz, & Comodi, 2020). The use of ESSs allows users to create energy arbitrage by discharging during price peaks and charging during off-peak periods if a variable energy price is considered (Bradbury, Pratson, & Patiño-Echeverri, 2014; Kolodziejczyk, Zoltowska, & Cichosz, 2021). In addition, ESSs contribute to the overall resilience of the energy community when facing systematic failures or natural disasters (Nguyen, Muhs, & Parvania, 2019).

    View all citing articles on Scopus
    View full text