Improved residential energy management system using priority double deep Q-learning

https://doi.org/10.1016/j.scs.2021.102812Get rights and content

Highlights

  • Introduced priority deep Q-learning (PDQN-DR) to priorities past learned experience.

  • A novel reward function for the reinforcement learning agent.

  • DR adapted Epsilon Greedy Policy to guide the agent.

  • Reduced 13.2% in consumers’ monthly electric bill and reduced 3% in peak demand.

  • Small environment saved 13.6% in consumers’ monthly electric bill

Abstract

In the current era, electricity demand has skyrocketed. Power grids have to face a lot of uneven power demand daily. During a certain period in a day, the power demand peaks, making it difficult for the grid to meet the demand. To deal with this problem, an intelligent Home Energy Management (HEM) can be beneficial. Smart HEM systems can schedule loads from peak to low peak hours. Thereby reducing peak load on the grid as well as reducing decreasing the costs incurred by a user. In this paper, we proposed a Deep Reinforcement Learning model with prioritized experience sampling (PQDN-DR) for appropriate demand response, and the problem of load shifting is simulated as a game. We also propose a novel reward system for better convergence of the DRL model to near-optimal strategies and a DR adapted Epsilon Greedy Policy to guide the agent in exploration phase for faster convergence. The proposed system minimizes power demand peak and consumers’ bills simultaneously. The proposed method has successfully reduced the peak load and peak costs in smaller DR environment. The agent reduced costs and overall variance of the load profile for all customers for 24 h in the standard DR environment.

Introduction

Energy demands worldwide have grown each year steadily. This is due to the advancements in technology and all the electrical appliances that come with it. Electric appliance ownership is rising, and this has led to an increase in electricity demand. High demand without increased supply leads to higher costs. Nevertheless, when it comes to electricity as a commodity, people are willing to spend more money. Thus, high demand leads to higher costs, but that does not affect the demand in the market since people need those appliances. The appliances have become essential in their day-to-day lives. Hence, it becomes an important task to manage energy demands efficiently.

The currently available traditional grids are very primitive to handle such a task of handling high energy demands efficiently. For this task of managing energy demands and handling the interaction between utilities and consumers, a Smart Grid (SG) is one of the most viable choices (Cecati, Mokryani, Piccolo, & Siano, 2010). SG's goal including efficient information deliverance for optimal load control, so that system demands and costs are minimized, and energy efficiency is maximized. Given the high energy demands, the research in improving the SGs has increased over the last few years. Due to the dramatic energy demands in the market and the trends of energy consumption during a particular day, there is usually a peak load during particular period of the day. Demand Side Management (DSM) can be used to avert handling such kinds of loads to design the solution. DSM's (Palensky & Dietrich, 2011) encourages consumers to alter their usage strategies to reduce the load on the SGs on peak hours of the day. DSM program offers incentives for the users who participate in it. DSM offers many programs, some of which include energy efficiency (EE), energy conservation (EC), and demand response (DR) (Boshell & Veloza, 2008).

Demand Response (DR) (Rahimi & Ipakchi, 2010) has become a more favorable option for handling this situation and avoiding the energy market meltdown. DR involves (Albadi & El-Saadany, 2007b) changes in electricity usage patterns of consumers depending on the changes in the costs of electricity. DR can be used to appropriately shifts the loads from the peak hours to the time of the day when demand is low. DR does not change the total energy consumed by the consumers but focuses on changing when the user consumes. This can effectively lead to increased energy handling efficiency and lower the energy consumption costs for consumers. Electricity generated by different sources has varying levels of costs, and these generators are used in conjunction to get the costs. This makes the costs of electricity very dynamic. That is where DR comes into the picture, as it can work with the dynamic nature of costs to schedule the loads to those times of the day when the cost of consumption is low. This will lead to lower consumer costs and reduced peak loads. DR based optimization models can be categorized into two types – price based (Chen, Wu, & Fu, 2012) and incentive-based (Asadinejad, Tomsovic, & Chen, 2016). In this paper, price based optimization model is used. Price based models used different pricing strategies like Time of Use (ToU) pricing, peak load pricing, critical peak pricing, and real-time pricing (Severin, Michael, & Rosenfeld, 2002). This varying nature of pricing leads the consumers to adjust their usage patterns to take advantage of lower prices during particular periods. Incentive-Based DR programs are mainly two types: Classical programs and Market-based programs. Classical programs include Direct Load Control and Interruptible programs, Market-based are Emergency DR, Demand Bidding, Capacity Market, and Ancillary services market (Albadi & El-Saadany, 2007a). In Caron and Kesidis (2010), the authors proposed a pricing scheme for consumers with incentives to achieve a lower aggregate load profile. They also studied the load demand minimization possible with the amount of information that consumers share. In Ghazvini et al. (2015) linear and nonlinear modeling for incentive-based DR for real power markets was proposed. System-level dispatch of demand response resources with a novel incentive-based demand response model was proposed by Yu and Hong (2017). In Aalami, Moghaddam, and Yousefi (2010), the authors propose an Interruptible program, including penalties for customers if they do not respond to load reduction. A real-time implementation of incentive-based DR programs with hardware for residential buildings is shown in Caron and Kesidis (2010). In Zhong, Xie, and Xia (2012), a novel DR program targeting small to medium size commercial, industrial, and residential customers is proposed.

Reinforcement learning (RL) (Sutton & Barto, 2018) is a class of solutions that deal with learning which action to perform at a particular time step given the state of the environment to maximize the rewards. RL has a track record of handling highly dynamic problems and environments by evaluating the state-action value. These state-action values determine the value of performing an action given a state, and these are used to create policy mapping wherein states are mapped to actions to be performed by the agent. RL has shown to perform significantly well, even with no prior domain knowledge.

In this paper, we use Deep Reinforcement Learning (DRL) (van Hasselt, Guez, & Silver, 2015) with prioritised experience sampling wherein deep learning techniques such as neural networks are used in conjunction with RL techniques to approximate the state-action value functions better as shown in Fig. 1. It has been shown that DQN agents, which are a part of the DRL technique, outperform the traditional RL techniques, thereby solidifying its stance in machine learning (Mnih et al., 2015). This paper is a continuation of the work introduced in Mathew, Roy, and Mathew (2020), and it uses a similar environment but with a different reward function and agent. The following are the main contributions of this work:

  • 1.

    Introduced Priority Deep Q-learning (PDQN-DR) to priorities past learned experience for faster convergence in Demand Side Management for demand response.

  • 2.

    Introduce a novel reward function for the reinforcement learning agent to understand the environment better since it gets a more frequent stream of rewards rather than sparse rewards for each action.

  • 3.

    Introduced DR adapted Epsilon Greedy Policy to guide the agent in exploration phase for faster convergence.

  • 4.

    The proposed reinforcement learning model with standard environment saved 13.2% in consumers’ monthly electric bill and reduced 3% in peak demand. MILP could only reduce consumers’ monthly electric bill by 3.3%.

  • 5.

    The proposed reinforcement learning model with small environment saved 13.6% in consumers’ monthly electric bill.

Section snippets

Related work

There have been many works to achieve Demand Response Optimization. Conejo, Morales, and Baringo (2010) describes a linear programming algorithm to schedule the loads on an hourly basis based on the prices every hour. Chavali, Yang, and Nehorai (2014) describes an approximate greedy iterative algorithm that works on scheduling the loads for cost minimization. Optimal load scheduling using Mixed Integer linear programming (MILP) have been discussed in Lokeshgupta and Sivasubramani (2019b),

Methods

Machine learning techniques like Reinforcement learning (RL) has shown considerable paces in learning to take action in a game environment surpassing humans. It has been shown carefully designed RL can benefit in problem like DR. RL intelligent agents always work in an environment; thus, DR needs to be model into a game environment. Atari games like Tetris are very similar to the DR environment we needed. Tetris game allows the user to move a 2D block in a 2D grid. The player's goal is to

Simulation results and discussion

The proposed scheme is experimented with two environments with different sizes, i.e., smaller DR environment 8×10 and standard DR environment 24×25 to demonstrate the agent's performance. A large grid environment has enormous state space. In an environment with p units high and q units wide, the state space will be O(2pq). Smaller DR environment has a state space of size O(280) and standard DR environment has a state space of size O(2600). These huge state spaces make a smaller neural network

Conclusion

The exponential growth in power demand in the household has increased the power grid's stress to meet its demand. DR can help a smart grid to improve its efficiency to meet the power need of the customer. This paper introduces some advanced DRL settings with prioritized experience sampling and novel reward function to solve load scheduling which simultaneously reduce the peak demand of the utility and consumer's bill. However, here we have investigated some concerns by introducing a new reward

Declaration of Competing Interest

The authors report no declarations of interest.

References (46)

  • F. Boshell et al.

    Review of developed demand side management programs including different concepts and their results

    2008 IEEE/PES transmission and distribution conference and exposition: Latin America

    (2008)
  • Z. Bradac et al.

    Optimal scheduling of domestic appliances via milp

    Energies

    (2015)
  • S. Caron et al.

    Incentive-based energy consumption scheduling algorithms for the smart grid

  • C. Cecati et al.

    An overview on the smart grid concept

    IECON 2010 – 36th annual conference on IEEE industrial electronics society

    (2010)
  • P. Chavali et al.

    A distributed algorithm of appliance scheduling for home energy management system

    IEEE Transactions on Smart Grid

    (2014)
  • Z. Chen et al.

    Real-time price-based demand response management for residential appliances via stochastic optimization and robust optimization

    IEEE Transactions on Smart Grid

    (2012)
  • A.J. Conejo et al.

    Real-time demand response model

    IEEE Transactions on Smart Grid

    (2010)
  • M.A.F. Ghazvini et al.

    A multi-objective model for scheduling of short-term incentive-based demand response programs offered by electricity retailers

    Applied Energy

    (2015)
  • S.G. Hamed et al.

    Multi-objective cost-load optimization for demand side

    Information Networking (ICOIN)

    (2016)
  • S. Haykin

    Neural networks: A comprehensive foundation

    (1999)
  • A. Jindal et al.

    Drums: Demand response management in a smart city using deep learning and svr

    2018 IEEE global communications conference (GLOBECOM)

    (2018)
  • A. Khodaei et al.

    Scuc with hourly demand response considering intertemporal load characteristics

    IEEE Transactions on Smart Grid

    (2011)
  • B. Lokeshgupta et al.

    Cooperative game theory approach for multi-objective home energy management with renewable energy integration

    IET Smart Grid

    (2019)
  • Cited by (0)

    View full text