Real-time energy purchase optimization for a storage-integrated photovoltaic system by deep reinforcement learning

doi:10.1016/j.conengprac.2020.104598

Control Engineering Practice

Volume 106, January 2021, 104598

https://doi.org/10.1016/j.conengprac.2020.104598 Get rights and content

Highlights

•
Real-time energy purchases for a storage-integrated PV system are optimized.
•
Non-linearities and stochastic data are approached with deep reinforcement learning.
•
Deep reinforcement learning uses Q-learning combined with a deep neural network.
•
Charge control strategies arise from optimized grid energy purchases.
•
Simulation results performed on data from polish system confirm effectiveness.

Abstract

The objective of this article is to minimize the cost of energy purchased on a real-time basis for a storage-integrated photovoltaic (PV) system installed in a microgrid. Under non-linear storage charging/discharging characteristics, as well as uncertain solar energy generation, demands, and market prices, it is a complex task. It requires a proper level of tradeoff between storing too much and too little energy in the battery: future excess PV energy is lost in the former case, and demand is exposed to future high electricity prices in the latter case. We propose a reinforcement learning approach to deal with a non-stationary environment and non-linear storage characteristics. To make this approach applicable, a novel formulation of the decision problem is presented, which focuses on the optimization of grid energy purchases rather than on direct storage control. This limits the complexity of the state and action space, making it possible to achieve satisfactory learning speed and avoid stability issues. Then the Q-learning algorithm combined with a dense deep neural network for function representation is used to learn an optimal decision policy. The algorithm incorporates enhancements that were found to improve learning speed and stability by prior work, such as experience replay, target network, and increasing discount factor. Extensive simulation results performed on real data confirm that our approach is effective and outperforms rule-based heuristics.

Introduction

We are on the verge of dramatic technological and cultural changes, caused by the shift from coal to renewable energy sources. With the development of smart grids, new control methods are required to support power generation and storage so as to meet energy demand. However, control decision-making in smart grids is more difficult comparing with traditional power systems because the operation of a high-renewables system is associated with more uncertainty (Zhang et al., 2018). One way to mitigate the unpredictability of renewables is the application of energy storage technologies in locally isolated microgrid areas. Still, the optimal operation of storage-integrated systems remains a challenge, taking into account non-linear storage charging/discharging characteristics, and uncertain conditions (Chauhan & Saini, 2014).

Even though a storage system can help manage the stochastic behavior of renewables, there may be insufficient availability of renewable energy for quite long periods, like during cloudy or winter days in case of solar photovoltaic (PV) systems. Therefore, a microgrid system may not operate in stand-alone mode, but it must be supported by a grid energy supply to fulfill the critical load with conventional power. Real-time electricity prices are most likely to be applied in such a case, as argued by many economists (Dufo-López, 2015). The objective of optimal storage operation would be to minimize the cost of energy purchased from the grid. The energy management processes under consideration are nonlinear, stochastic, and multi-period. The overall cost feedback is delayed relative to individual purchase decisions, because buying too little now may force additional purchases later on, and buying too much now may prevent making savings later on when prices drop. This creates challenges that can be hardly addressed with simple ad hoc control strategies (Iovine et al., 2019).

The hybrid photovoltaic-electrical energy storage technology is the most popular installation in leading markets, as reviewed by Liu et al. (2019). Optimization of hybrid PV-storage systems has been extensively investigated to improve system performance (Arani et al., 2019). Ampatzis et al. (2017) explain how such systems can be used in a cluster managed by an aggregator to obtain demand response.

Numerous methods have been proposed to determine cost- minimizing real-time control of an integrated PV-storage system, based on forecasting of load, renewables, and prices. These methods fall into categories of dynamic programming (Li & Danzer, 2014), convex optimization (Wang et al., 2015), stochastic optimization (Conte et al., 2017), and the optimization of Lagrange multipliers (Nge et al., 2019). To reduce dependency on forecasting accuracy some researchers proposed fuzzy logic controllers (Teo et al., 2018), or heuristic search methods, such as particle swarm optimization (Stoppato et al., 2014), refined by a genetic algorithm (Phan et al., 2018).

Reinforcement learning is an attractive paradigm for addressing stochastic optimal control problems. This approach, based on dynamic interactions and evaluative feedback, does not require forecasting models to be available. Some domain knowledge is usually still required, though, to properly design the learning control system, including its input and output representation as well as training information (Glavic et al., 2017).

There is only a handful of articles that use machine learning approaches to optimize energy storage control. One of the first studies employing adaptive dynamic programming, an approach closely related to reinforcement learning, is that of Wei et al. (2014), subsequently extended to account for PV generation (Wei et al., 2017). They determine optimal battery charging/discharging/idle control law, which minimizes the total expense of the power from the grid under the assumptions of periodic residential load and electricity rate. The dynamic pricing demand response problem that takes into consideration the uncertainty of load demand was solved using reinforcement learning by Lu et al. (2018). Recently (Henri & Lu, 2019) used a supervised machine learning approach to control several different energy storage devices.

Deep reinforcement learning, which is a combination of deep learning and reinforcement learning, is recently gaining a lot of attention. However, its applications in power system and smart grids can be scarcely found (Zhang et al., 2018). An application of deep Q-learning to real-time scheduling of energy consuming resources was presented by Zhang et al. (2017). Lee and Choi (2019) apply the Q-learning algorithm to schedule energy consumption of home appliances, including energy storage system. To the best of our knowledge, no study has applied deep reinforcement learning to energy control and cost minimization for a complex, stochastic system, as considered in this article.

The core contribution of this work is the development of the architecture of a control system with an intelligent decision-making module using an advanced model-free learning algorithm. More specifically, the main achievements presented in the article are listed below.

•
The application of deep reinforcement learning as a control method for a complex system with nonlinear, stochastic, and multi-period processes.
•
A novel formulation of the energy storage control problem as making purchase decisions, which makes it possible to limit the size of the action space to a relatively small number of discrete actions and reduce the dimensionality of the state space.
•
Successful demonstrations in realistic simulation experiments, in which the proposed algorithm, with appropriately tuned parameters, exhibits good learning speed and stability, and achieves better control quality than human-designed heuristic rules in several different environment configurations.

The remainder of this article is organized as follows. The adopted problem formulation is presented in Section 2. Section 3 describes the proposed reinforcement learning approach to optimizing energy purchase decision. The results of a realistic experimental case study are presented in Section 4. Section 5 summarizes the main findings of this work and outlines some promising continuation directions.

Section snippets

Problem formulation

The results presented in this article are based on a realistic simulation of a storage-integrated solar power system. The assumed system architecture matches standard microgrid infrastructure, and real insolation, energy consumption, energy price data are used to perform the simulation.

Solution method

In the paradigm of reinforcement learning a learning agent learns to perform its task from interactions with its environment. At each time step it observes the current state of the environment and performs an action. Then it receives a reinforcement value, also called a reward, and a state transition takes place. State transitions and reinforcement values may be, in general, stochastic, and the agent does not know either their underlying distributions or expected values. The objective of

Case study

To verify the effectiveness and performance of the proposed Q-learning approach, several test cases and comparisons were studied.

Conclusions

The combination of reinforcement learning with deep neural networks has become one of the most promising areas in the field of artificial intelligence. Successful applications to practical problems with real-time control can be still hardly found, though. This article managed to overcome difficulties anticipated by the theory and observed in prior experimental work, such as insufficient learning speed and poor stability issues. We believe the achieved successful performance can be primarily

CRediT authorship contribution statement

Waldemar Kolodziejczyk: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Izabela Zoltowska: Supervision, Research methodology, Writing - original draft, Writing - review & editing. Pawel Cichosz: Supervision, Research methodology, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (53)

ChauhanA. et al.
A review on integrated renewable energy system based power generation for stand-alone applications: Configurations, storage options, sizing methodologies and control
Renewable & Sustainable Energy Reviews
(2014)
ConteF. et al.
Day-ahead planning and real-time control of integrated PV-storage systems by stochastic optimization
IFAC-PapersOnLine
(2017)
Dufo-LópezR.
Optimisation of size and control of grid-connected storage under real time electricity pricing conditions
Applied Energy
(2015)
GlavicM. et al.
Reinforcement learning for electric power system decision and control: past considerations and perspectives
IFAC-PapersOnLine
(2017)
IovineA. et al.
Power management for a DC MicroGrid integrating renewables and storages
Control Engineering Practice
(2019)
LiJ. et al.
Optimal charge control strategies for stationary photovoltaic battery systems
Journal of Power Sources
(2014)
LiuJ. et al.
Overview on hybrid solar photovoltaic-electrical energy storage technologies for power supply to buildings
Energy Conversion and Management
(2019)
LuR. et al.
A dynamic pricing demand response algorithm for smart grid: reinforcement learning approach
Applied Energy
(2018)
NgeC.L. et al.
A real-time energy management system for smart grid integrated photovoltaic generation with battery storage
Renewable Energy
(2019)
PhanQ.A. et al.
Determination of optimal battery utilization to minimize operating costs for a grid-connected building with renewable energy sources
Energy Conversion and Management
(2018)

StoppatoA. et al.

A PSO (particle swarm optimization)-based model for the optimal management of a small PV (photovoltaic)-pump hydro energy storage in a rural dry area

Energy

(2014)

UrtasunA. et al.

State-of-Charge-based droop control for stand-alone AC supply systems with distributed energy storage

Energy Conversion and Management

(2015)

AmpatzisM. et al.

Robust optimisation for deciding on real-time flexibility of storage-integrated photovoltaic units controlled by intelligent software agents

IET Renewable Power Generation

(2017)

AndersonC.W.

Strategy learning with multilayer connectionist representations

AraniA.K. et al.

Review on energy storage systems control methods in microgrids

International Journal of Electrical Power & Energy Systems

(2019)

ArulkumaranK. et al.

Deep reinforcement learning: A brief survey

IEEE Signal Processing Magazine

(2017)

BairdL.C.

Residual algorithms: Reinforcement learning with function approximation

BartoA.G. et al.

On the computational economics of reinforcement learning

BartoA.G. et al.

Neuronlike adaptive elements that can solve difficult learning control problems

IEEE Transactions on Systems, Man, and Cybernetics

(1983)

BellmanR.E.

Dynamic programming

(1957)

BoyanJ. et al.

Generalization in reinforcement learning: Safely approximating the value function

CaelenO. et al.

Improving the exploration strategy in bandit algorithms

CichoszP.

Truncating temporal differences: On the efficient implementation of TD( $λ$ ) for reinforcement learning

Journal of Artificial Intelligence Research

(1995)

CichoszP.

An analysis of experience replay in temporal difference learning

Cybernetics and Systems

(1999)

CritesR.H. et al.

Improving elevator performance using reinforcement learning

François-LavetV. et al.

How to discount deep reinforcement learning: Towards new dynamic strategies

(2015)

Cited by (31)

Reviewing 40 years of artificial intelligence applied to power systems – A taxonomic perspective
2024, Energy and AI
Artificial intelligence (AI) as a multi-purpose technology is gaining increased attention and is now widely used across all sectors of the economy. The growing complexity of planning and operating power systems makes AI extremely valuable for the power industry. Until now, there has been a lack of clarity regarding the specific points along the power system supply chain where AI applications demonstrate significant value, as well as which AI domains are best suited for such applications. This study employs an AI taxonomy and automated web search to qualitatively and quantitatively unveil the biggest potentials of AI in the power industry. Our analysis, based on a review of 258’919 publications between 1982 and 2022, reveals where AI applications are particularly promising. We consider six AI domains (reasoning, planning, learning, communication, perception, integration & interaction) and 19 use cases from the power supply chain (i.e., generation, transmission networks, distribution networks, isolated grids/ microgrids, market operations and retail). Our findings indicate that, as of now, the focus is predominantly on AI applications in power retail (55 %), transmission (14 %) and generation (13 %). Most analyzed works describe applications built on algorithms of the AI domains “learning” (45 %) and “planning” (14 %). Results also suggest that the current definition of AI domains is ambiguous, and they highlight missing information on the actual use and successful implementation of AI in power system use cases.
Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
2024, Journal of Cleaner Production
Reinforcement Learning (RL) provides a pathway for efficiently utilizing the battery storage in a microgrid. However, traditional value-based RL algorithms used in battery management focus on formulating the policies based on the reward expectation rather than its probability distribution. Hence the scheduling strategy is solely based on the expectation of the rewards rather than the distribution. This paper focuses on scheduling strategy based on probability distribution of the rewards which optimally reflects the uncertainties in the incoming dataset. Furthermore, the prioritized experience replay samples of the training experience are used to enhance the quality of the learning by reducing bias. The results are obtained with different variants of distributional RL algorithms like C51, Quantile Regression Deep Q-Network (QR-DQN), Fully Quantizable Function (FQF), Implicit Quantile Networks (IQN) and rainbow. Moreover, the results are compared with the traditional deep Q-learning algorithm with prioritized experienced replay. The convergence results on the training dataset are further analyzed by varying the action spaces, using randomized experience replay and without including the tariff-based action while enforcing the penalties for violating battery SoC limits. The best trained Q-network is tested with different load and PV profiles to obtain the battery operation and costs. The performance of the distributional RL algorithms is analyzed under different schemes of Time of Use (ToU) tariff. QR-DQN with prioritized experience replay has been found to be the best performing algorithm in terms of convergence on the training dataset, with least fluctuation in validation dataset and battery operations during different tariff regimes during the day.
A spatio-temporality-enabled parallel multi-agent-based real-time dynamic dispatch for hydro-PV-PHS integrated power system
2023, Energy
Integrated power system has emerged as a powerful alternative to penetrate renewables, due to its ability to reconcile energy discrepancy. However, due to limited mainstreams and complex mountainous meteorology, the dispatch of hydro-photovoltaic-pumped hydro storage (Hydro-PV-PHS) integrated power system (IPS) which are predominantly composed of cascaded daily-regulation and uncontrollable runoff hydropower stations and PVs still miss the expected clean energy utilization rates. To conquer the issue, a novel spatio-temporality-enabled parallel multi-agent-based dynamic dispatch method is proposed. At the outset, a temporal dispatch model fed by dynamic measurements of available PV generation and inflow is presented. To master the uncertainties, such model must be solved in real-time upon the incoming measurements. Whereupon the presented model is recast as Markov decision process for learning dispatch policies, parameterized by neural network agents. To manage the enormous spatial-temporal operating space of the hydro-PV-PHS IPS and to prevent conflict policies, a long short-term memory auto-encoder (LSTM-AE) combined unsupervised learning scheme is used to alleviate divergence and decrease stochasticity of renewables to pre-uncouple policies, which are then distributed to multiple parallel agents. Finally, distributed proximal policy optimization is conducted to produce dispatch policies in an offline parallel manner, with each agent responsible for dispatching the hydro-PV-PHS IPS within the respective operating subspace. The numerical studies in a real-world case demonstrate that the proposed scheme enables real-time and near-optimal dynamic dispatch for the concern IPS, and outperforms other rivals in terms of adaptability, robustness, and efficiency.
Operational optimization for the grid-connected residential photovoltaic-battery system using model-based reinforcement learning
2023, Journal of Building Engineering
The development of distributed photovoltaic and energy storage devices has created challenges for energy management systems due to uncertainty and mismatch between local generation and residents' energy demand. Reinforcement learning is gaining attention as a control algorithm, but traditional model-free RL has data quality and quantity limitations for energy management applications. Therefore, this study proposed a model-based deep RL method to optimize the operation control of the energy storage system by taking the measured dataset of an actual existing building in Japan as the research object. With an optimization goal of reducing the microgrid's energy cost and ensuring the PV self-consumption ratio, we designed a new reward function for these goals. We took the benchmark strategy currently used by the target building's energy management system as the baseline model in the experiment. We applied four advanced RL algorithms (PPO, DQN, DDPG, and TD3) to optimize the baseline model. The results show that the proposed RL design can better achieve the two optimization objectives of minimizing energy cost and maximizing the PV self-consumption ratio. Among them, the TD3 algorithm presented the best performance. Compared with the baseline model, its annual energy cost can be reduced by 17.82%, and the photovoltaic self-consumption ratio can be increased by 0.86%. In addition, the model-based RL method proposed in this paper can provide a better energy management strategy with the training set of only one and a half years of measured data, which proves that it has a high potential for practical application.
TASAC: A twin-actor reinforcement learning framework with a stochastic policy with an application to batch process control
2023, Control Engineering Practice
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor–critic architecture have recently become popular for controlling systems where state and action spaces are continuous. The current study proposes a stochastic actor–critic RL algorithm, termed Twin Actor Soft Actor–Critic (TASAC), by incorporating an ensemble of actors in a maximum entropy framework to improve learning due to enhanced exploration. The efficacy of the proposed approach is showcased by applying the same for the control of batch transesterification.
Control frameworks for transactive energy storage services in energy communities
2023, Control Engineering Practice
Citation Excerpt :
Fostered by the decreasing cost of storage technologies and emerging mechanisms of energy exchange and sharing, a viable solution to attain self-consumption of on-site production is represented by the use of energy storage systems (ESSs) that are valuable resources of the community at the local level (Bartolini, Carducci, Muñoz, & Comodi, 2020). The use of ESSs allows users to create energy arbitrage by discharging during price peaks and charging during off-peak periods if a variable energy price is considered (Bradbury, Pratson, & Patiño-Echeverri, 2014; Kolodziejczyk, Zoltowska, & Cichosz, 2021). In addition, ESSs contribute to the overall resilience of the energy community when facing systematic failures or natural disasters (Nguyen, Muhs, & Parvania, 2019).
Recently, the decreasing cost of storage technologies and the emergence of economy-driven mechanisms for energy exchange are contributing to the spread of energy communities. In this context, this paper aims at defining innovative transactive control frameworks for energy communities equipped with independent service-oriented energy storage systems. The addressed control problem consists in optimally scheduling the energy activities of a group of prosumers, characterized by their own demand and renewable generation, and a group of energy storage service providers, able to store the prosumers’ energy surplus and, subsequently, release it upon a fee payment. We propose two novel resolution algorithms based on a game theoretical control formulation, a coordinated and an uncoordinated one, which can be alternatively used depending on the underlying communication architecture of the grid. The two proposed approaches are validated through numerical simulations on realistic scenarios. Results show that the use of a particular framework does not alter fairness, at least at the community level, i.e., no participant in the groups of prosumers or providers can strongly benefit from changing its strategy while compromising others’ welfare. Lastly, the approaches are compared with a centralized control method showing better computational results.

View all citing articles on Scopus

View full text

Real-time energy purchase optimization for a storage-integrated photovoltaic system by deep reinforcement learning

Highlights

Abstract

Introduction

Section snippets

Problem formulation

Solution method

Case study

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Renewable & Sustainable Energy Reviews

IFAC-PapersOnLine

Applied Energy

IFAC-PapersOnLine

Control Engineering Practice

Journal of Power Sources

Energy Conversion and Management

Applied Energy

Renewable Energy

Energy Conversion and Management

Energy

Energy Conversion and Management

Robust optimisation for deciding on real-time flexibility of storage-integrated photovoltaic units controlled by intelligent software agents

IET Renewable Power Generation

Strategy learning with multilayer connectionist representations

Review on energy storage systems control methods in microgrids

International Journal of Electrical Power & Energy Systems

Deep reinforcement learning: A brief survey

IEEE Signal Processing Magazine

Residual algorithms: Reinforcement learning with function approximation

On the computational economics of reinforcement learning

Neuronlike adaptive elements that can solve difficult learning control problems

IEEE Transactions on Systems, Man, and Cybernetics

Dynamic programming

Generalization in reinforcement learning: Safely approximating the value function

Improving the exploration strategy in bandit algorithms

Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning

Journal of Artificial Intelligence Research

An analysis of experience replay in temporal difference learning

Cybernetics and Systems

Improving elevator performance using reinforcement learning

How to discount deep reinforcement learning: Towards new dynamic strategies

Truncating temporal differences: On the efficient implementation of TD( $λ$ ) for reinforcement learning