Building HVAC control with reinforcement learning for reduction of energy cost and demand charge
Graphical abstract
Training and testing diagram: blue lines indicate the training phase to get optimal weights for Q-network; red lines indicate the testing phase using trained Q-network; green lines indicate the common steps shared by both training and testing.
Introduction
It has been well acknowledged that commercial buildings account for over 20% of the total energy consumption in the U.S. [1]. Particularly, about 40–50% of such consumption is mainly attributed to the heating, ventilation, and air-conditioning (HVAC) systems [2]. Thus, energy efficiency remains a significantly critical topic and effective control and maintenance strategies for operating HVAC systems are indispensable to improve that. Traditionally, simple rule-based and feedback controls such as on–off control or PID control have been used to control the indoor environment based on different time and occupancy schedules [3]. However, these two categories of controllers result in sub-optimal performance due to lack of predictive information on external disturbances such as weather [4] and occupancy [5].
Model predictive control (MPC) as an optimal control strategy has been leveraged to tackle such issues by iteratively conducting online optimization over a receding time horizon. Though many successes have been demonstrated using MPC [6], [7], [8], its performance relies highly on the accuracy of the model [9], [10] as well as the robustness of the online optimization [11], [12]. The key challenge is balancing the trade-off between model accuracy and computational tractability. Another emerging technique that has received considerable attention is deep reinforcement learning (DRL), which is a learning-based optimal control method that learns by the direct interaction of the control agent with the environment. It has been used to solve the high-dimension sequential decision-making or continuous control problems such as in games [13], robotics [14] and autonomous driving [15], which were difficult for tabular RL. While by leveraging DRL, many works have shown the applicability to building HVAC optimal control [16], [17], [18], [19], [20], the sample complexity (the number of training samples required) still remains a significant issue to be addressed.
Most recent works leveraging DRL approaches and focusing on building HVAC controls have studied extensively energy consumption and thermal comfort requirements [7], [16]. However, due to the time-varying electricity price profiles (with different on-peak/off-peak prices), energy cost rather than consumption is more interesting to commercial buildings. Moreover, the cost caused by the peak electric demand, i.e., the so-called demand charge, further increases the difficulty of the optimal controller design in building HVAC systems. These pricing structures can be leveraged via passive thermal energy storage [21], which temporally shifts energy consumption by exploiting the thermal inertia of the solid building components. This technique can lead to significant cost reduction without sacrificing thermal comfort, but proper application requires accurate planning over horizons of 12–36 h. Though such a problem has been extensively investigated by using MPC-based methods [22], [23], [24], our work will instead attempt to address the problem with the recently developed DRL approaches.
Contributions. In this work, by taking into consideration practical scenarios where most key states are not fully observable, the environment is defined as a partially observable Markov decision process (POMDP) instead of an MDP as done in previous works [17], [25]. Specifically, the following contributions are made.
- •
We develop a model-free DRL approach, a Deep Q-Network (DQN) with an action processor that outputs near-optimal control actions given only a limited amount of historical data and user-defined comfort limits, to minimize the time-of-use cost and demand charge, while maintaining the comfort requirements.
- •
We also develop a reward shaping technique to overcome the issue of the reward sparsity caused by the demand charge.
- •
We propose a model selection technique to choose the best neural network weights for use once the training phase is complete. We employ three different model selection methods with the proposed DQN framework in order to show a thorough comparison.
- •
We empirically validate the proposed approach using a single-zone building simulation platform with a local PI controller regulating the zone temperature. Two scenarios, one with demand charges and one without, are investigated accordingly. In both scenarios, the temperature regulation will display the fact that pre-cooling is performed in the early morning when electricity is cheaper, thereby reducing expensive day-time consumption and lowering peak demand.
- •
We perform a practical evaluation of the strategies by including the cost of random exploration in performance metrics. Most other works examining RL assume that a very large amount of past historical data is freely available for training. However, in real buildings, electricity bills cannot be put on hold while exploratory data is collected, and thus these costs cannot be neglected.
Outline. The rest of the paper is outlined as follows. Section 2 presents related works on the applications of MPC and DRL to the building HVAC control as well as introduces time-varying cost and demand charge. While Section 3 provides an overview of POMDP and DQN, in Section 4 we describe the system in detail, consisting of the dynamics, cost function, and the rule-based policies. The proposed approaches and techniques are presented in Section 5. The simulation experiments including results and discussion are given in Section 6. Finally, concluding remarks and possible directions for future research are summarized in Section 7.
Section snippets
Related work and novelty
In this section, we review the recent related works by using MPC and DRL approaches. In addition to that, we also discuss the novelty and significance of our work.
Preliminaries
We provide the background information in this section before the proposed method is introduced in the following sections. We first present the POMDP and then revisit the Deep Q-Network (DQN).
System description
Before discussing our proposed approach, we describe our system of interest, including relevant dynamics, cost structure, action/observation spaces, and rule-based policies.
Proposed approach
While DRL is a very powerful general paradigm to learn to look ahead and try to maximize future rewards, the current training algorithms and techniques are often in practice constrained by lack of availability of large amount of training samples (that span across different states and actions), imperfect reward assignment schemes (assigning suboptimal rewards and penalties for actions taken in different states) and the length of the reward horizon (the longer the horizon, less effect a future
Simulation testbed
To test our proposed approach on the system described in Section 4, we make use of a simplified zone simulator developed for this and other work. Internally, the simulator models the zone using a second-order linear model, with one state for the (uniform) air temperature in the zone and a second state for the (uniform) temperature of the solid mass surrounding the zone. Both states receive direct head loads from solar radiation, plug loads, and internal occupants. In addition, they exchange
Conclusions and future work
In this paper, we have investigated using reinforcement learning to the problem of obtaining energy cost savings within buildings in the presence of time-varying electricity price profile and demand charge. This problem continues to be a challenging one in building HVAC systems. We have proposed improvements to a traditional DQN that include an action processor, model selection and reward shaping techniques for dealing with limited training data and the issue of sparse reward caused by the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (47)
- et al.
Review of building energy modeling for control and operation
Renewable Sustainable Energy Rev.
(2014) - et al.
Use of model predictive control and weather forecasts for energy efficient building climate control
Energy Build.
(2012) - et al.
Model predictive control with adaptive machine-learning-based model for building energy efficiency and comfort optimization
Appl. Energy
(2020) - et al.
Data-driven model predictive control for building climate control: Three case studies on different buildings
Build. Environ.
(2019) - et al.
Ten questions concerning model predictive control for energy efficient buildings
Build. Environ.
(2016) - et al.
Building modeling as a crucial part for building predictive control
Energy Build.
(2013) Efficient nonlinear model predictive control algorithms
Annu. Rev. Control
(2004)- et al.
Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning
Energy Build.
(2019) - et al.
Advanced building control via deep reinforcement learning
Energy Procedia
(2019) - et al.
Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network
Build. Environ.
(2020)