Improved residential energy management system using priority double deep Q-learning
Introduction
Energy demands worldwide have grown each year steadily. This is due to the advancements in technology and all the electrical appliances that come with it. Electric appliance ownership is rising, and this has led to an increase in electricity demand. High demand without increased supply leads to higher costs. Nevertheless, when it comes to electricity as a commodity, people are willing to spend more money. Thus, high demand leads to higher costs, but that does not affect the demand in the market since people need those appliances. The appliances have become essential in their day-to-day lives. Hence, it becomes an important task to manage energy demands efficiently.
The currently available traditional grids are very primitive to handle such a task of handling high energy demands efficiently. For this task of managing energy demands and handling the interaction between utilities and consumers, a Smart Grid (SG) is one of the most viable choices (Cecati, Mokryani, Piccolo, & Siano, 2010). SG's goal including efficient information deliverance for optimal load control, so that system demands and costs are minimized, and energy efficiency is maximized. Given the high energy demands, the research in improving the SGs has increased over the last few years. Due to the dramatic energy demands in the market and the trends of energy consumption during a particular day, there is usually a peak load during particular period of the day. Demand Side Management (DSM) can be used to avert handling such kinds of loads to design the solution. DSM's (Palensky & Dietrich, 2011) encourages consumers to alter their usage strategies to reduce the load on the SGs on peak hours of the day. DSM program offers incentives for the users who participate in it. DSM offers many programs, some of which include energy efficiency (EE), energy conservation (EC), and demand response (DR) (Boshell & Veloza, 2008).
Demand Response (DR) (Rahimi & Ipakchi, 2010) has become a more favorable option for handling this situation and avoiding the energy market meltdown. DR involves (Albadi & El-Saadany, 2007b) changes in electricity usage patterns of consumers depending on the changes in the costs of electricity. DR can be used to appropriately shifts the loads from the peak hours to the time of the day when demand is low. DR does not change the total energy consumed by the consumers but focuses on changing when the user consumes. This can effectively lead to increased energy handling efficiency and lower the energy consumption costs for consumers. Electricity generated by different sources has varying levels of costs, and these generators are used in conjunction to get the costs. This makes the costs of electricity very dynamic. That is where DR comes into the picture, as it can work with the dynamic nature of costs to schedule the loads to those times of the day when the cost of consumption is low. This will lead to lower consumer costs and reduced peak loads. DR based optimization models can be categorized into two types – price based (Chen, Wu, & Fu, 2012) and incentive-based (Asadinejad, Tomsovic, & Chen, 2016). In this paper, price based optimization model is used. Price based models used different pricing strategies like Time of Use (ToU) pricing, peak load pricing, critical peak pricing, and real-time pricing (Severin, Michael, & Rosenfeld, 2002). This varying nature of pricing leads the consumers to adjust their usage patterns to take advantage of lower prices during particular periods. Incentive-Based DR programs are mainly two types: Classical programs and Market-based programs. Classical programs include Direct Load Control and Interruptible programs, Market-based are Emergency DR, Demand Bidding, Capacity Market, and Ancillary services market (Albadi & El-Saadany, 2007a). In Caron and Kesidis (2010), the authors proposed a pricing scheme for consumers with incentives to achieve a lower aggregate load profile. They also studied the load demand minimization possible with the amount of information that consumers share. In Ghazvini et al. (2015) linear and nonlinear modeling for incentive-based DR for real power markets was proposed. System-level dispatch of demand response resources with a novel incentive-based demand response model was proposed by Yu and Hong (2017). In Aalami, Moghaddam, and Yousefi (2010), the authors propose an Interruptible program, including penalties for customers if they do not respond to load reduction. A real-time implementation of incentive-based DR programs with hardware for residential buildings is shown in Caron and Kesidis (2010). In Zhong, Xie, and Xia (2012), a novel DR program targeting small to medium size commercial, industrial, and residential customers is proposed.
Reinforcement learning (RL) (Sutton & Barto, 2018) is a class of solutions that deal with learning which action to perform at a particular time step given the state of the environment to maximize the rewards. RL has a track record of handling highly dynamic problems and environments by evaluating the state-action value. These state-action values determine the value of performing an action given a state, and these are used to create policy mapping wherein states are mapped to actions to be performed by the agent. RL has shown to perform significantly well, even with no prior domain knowledge.
In this paper, we use Deep Reinforcement Learning (DRL) (van Hasselt, Guez, & Silver, 2015) with prioritised experience sampling wherein deep learning techniques such as neural networks are used in conjunction with RL techniques to approximate the state-action value functions better as shown in Fig. 1. It has been shown that DQN agents, which are a part of the DRL technique, outperform the traditional RL techniques, thereby solidifying its stance in machine learning (Mnih et al., 2015). This paper is a continuation of the work introduced in Mathew, Roy, and Mathew (2020), and it uses a similar environment but with a different reward function and agent. The following are the main contributions of this work:
- 1.
Introduced Priority Deep Q-learning (PDQN-DR) to priorities past learned experience for faster convergence in Demand Side Management for demand response.
- 2.
Introduce a novel reward function for the reinforcement learning agent to understand the environment better since it gets a more frequent stream of rewards rather than sparse rewards for each action.
- 3.
Introduced DR adapted Epsilon Greedy Policy to guide the agent in exploration phase for faster convergence.
- 4.
The proposed reinforcement learning model with standard environment saved 13.2% in consumers’ monthly electric bill and reduced 3% in peak demand. MILP could only reduce consumers’ monthly electric bill by 3.3%.
- 5.
The proposed reinforcement learning model with small environment saved 13.6% in consumers’ monthly electric bill.
Section snippets
Related work
There have been many works to achieve Demand Response Optimization. Conejo, Morales, and Baringo (2010) describes a linear programming algorithm to schedule the loads on an hourly basis based on the prices every hour. Chavali, Yang, and Nehorai (2014) describes an approximate greedy iterative algorithm that works on scheduling the loads for cost minimization. Optimal load scheduling using Mixed Integer linear programming (MILP) have been discussed in Lokeshgupta and Sivasubramani (2019b),
Methods
Machine learning techniques like Reinforcement learning (RL) has shown considerable paces in learning to take action in a game environment surpassing humans. It has been shown carefully designed RL can benefit in problem like DR. RL intelligent agents always work in an environment; thus, DR needs to be model into a game environment. Atari games like Tetris are very similar to the DR environment we needed. Tetris game allows the user to move a 2D block in a 2D grid. The player's goal is to
Simulation results and discussion
The proposed scheme is experimented with two environments with different sizes, i.e., smaller DR environment and standard DR environment to demonstrate the agent's performance. A large grid environment has enormous state space. In an environment with units high and units wide, the state space will be . Smaller DR environment has a state space of size and standard DR environment has a state space of size . These huge state spaces make a smaller neural network
Conclusion
The exponential growth in power demand in the household has increased the power grid's stress to meet its demand. DR can help a smart grid to improve its efficiency to meet the power need of the customer. This paper introduces some advanced DRL settings with prioritized experience sampling and novel reward function to solve load scheduling which simultaneously reduce the peak demand of the utility and consumer's bill. However, here we have investigated some concerns by introducing a new reward
Declaration of Competing Interest
The authors report no declarations of interest.
References (46)
- et al.
Demand response modeling considering interruptible/curtailable loads and capacity market programs
Applied Energy
(2010) - et al.
Incentive-based demand response for smart grid with reinforcement learning and deep neural network
Applied Energy
(2019) - et al.
Demand response algorithms for smart-grid ready residential buildings using machine learning models
Applied Energy
(2019) - et al.
Exploiting heuristic algorithms to efficiently utilize energy management controllers with renewable energy sources
Energy and Buildings
(2016) - et al.
Incentive-based demand response considering hierarchical electricity market: A stackelberg game approach
Applied Energy
(2017) - et al.
Demand response in electricity markets: An overview
- et al.
Demand response in electricity markets: An overview
2007 IEEE power engineering society general meeting
(2007) - et al.
Demand response strategy based on reinforcement learning and fuzzy reasoning for home energy management
IEEE Access
(2020) - et al.
A brief survey of deep reinforcement learning
(2017) - et al.
Sensitivity of incentive based demand response program to residential customer elasticity
2016 North American power symposium (NAPS)
(2016)