Elsevier

Energy and Buildings

Volume 239, 15 May 2021, 110833
Energy and Buildings

Building HVAC control with reinforcement learning for reduction of energy cost and demand charge

https://doi.org/10.1016/j.enbuild.2021.110833Get rights and content

Abstract

Energy efficiency remains a significant topic in the control of building heating, ventilation, and air-conditioning (HVAC) systems, and diverse set of control strategies have been developed to optimize performance, including recently emerging techniques of deep reinforcement learning (DRL). While most existing works have focused on minimizing energy consumption, the generalization to energy cost minimization under time-varying electricity price profiles and demand charges has rarely been studied. Under these utility structures, significant cost savings can be achieved by pre-cooling buildings in the early morning when electricity is cheaper, thereby reducing expensive afternoon consumption and lowering peak demand. However, correctly identifying these savings requires planning horizons of one day or more. To tackle this problem, we develop Deep Q-Network (DQN) with an action processor, defining the environment as a Partially Observable Markov Decision Process (POMDP) with a reward function consisting of energy cost (time-of-use and peak demand charges) and a discomfort penalty, which is an extension of most reward functions used in existing DRL works in this area. Moreover, we develop a reward shaping technique to overcome the issue of reward sparsity caused by the demand charge. Through a single-zone building simulation platform, we demonstrate that the customized DQN outperforms the baseline rule-based policy, saving close to 6% of total cost with demand charges, while close to 8% without demand charges.

Graphical abstract

Training and testing diagram: blue lines indicate the training phase to get optimal weights for Q-network; red lines indicate the testing phase using trained Q-network; green lines indicate the common steps shared by both training and testing.

  1. Download : Download high-res image (153KB)
  2. Download : Download full-size image

Introduction

It has been well acknowledged that commercial buildings account for over 20% of the total energy consumption in the U.S. [1]. Particularly, about 40–50% of such consumption is mainly attributed to the heating, ventilation, and air-conditioning (HVAC) systems [2]. Thus, energy efficiency remains a significantly critical topic and effective control and maintenance strategies for operating HVAC systems are indispensable to improve that. Traditionally, simple rule-based and feedback controls such as on–off control or PID control have been used to control the indoor environment based on different time and occupancy schedules [3]. However, these two categories of controllers result in sub-optimal performance due to lack of predictive information on external disturbances such as weather [4] and occupancy [5].

Model predictive control (MPC) as an optimal control strategy has been leveraged to tackle such issues by iteratively conducting online optimization over a receding time horizon. Though many successes have been demonstrated using MPC [6], [7], [8], its performance relies highly on the accuracy of the model [9], [10] as well as the robustness of the online optimization [11], [12]. The key challenge is balancing the trade-off between model accuracy and computational tractability. Another emerging technique that has received considerable attention is deep reinforcement learning (DRL), which is a learning-based optimal control method that learns by the direct interaction of the control agent with the environment. It has been used to solve the high-dimension sequential decision-making or continuous control problems such as in games [13], robotics [14] and autonomous driving [15], which were difficult for tabular RL. While by leveraging DRL, many works have shown the applicability to building HVAC optimal control [16], [17], [18], [19], [20], the sample complexity (the number of training samples required) still remains a significant issue to be addressed.

Most recent works leveraging DRL approaches and focusing on building HVAC controls have studied extensively energy consumption and thermal comfort requirements [7], [16]. However, due to the time-varying electricity price profiles (with different on-peak/off-peak prices), energy cost rather than consumption is more interesting to commercial buildings. Moreover, the cost caused by the peak electric demand, i.e., the so-called demand charge, further increases the difficulty of the optimal controller design in building HVAC systems. These pricing structures can be leveraged via passive thermal energy storage [21], which temporally shifts energy consumption by exploiting the thermal inertia of the solid building components. This technique can lead to significant cost reduction without sacrificing thermal comfort, but proper application requires accurate planning over horizons of 12–36 h. Though such a problem has been extensively investigated by using MPC-based methods [22], [23], [24], our work will instead attempt to address the problem with the recently developed DRL approaches.

Contributions. In this work, by taking into consideration practical scenarios where most key states are not fully observable, the environment is defined as a partially observable Markov decision process (POMDP) instead of an MDP as done in previous works [17], [25]. Specifically, the following contributions are made.

  • We develop a model-free DRL approach, a Deep Q-Network (DQN) with an action processor that outputs near-optimal control actions given only a limited amount of historical data and user-defined comfort limits, to minimize the time-of-use cost and demand charge, while maintaining the comfort requirements.

  • We also develop a reward shaping technique to overcome the issue of the reward sparsity caused by the demand charge.

  • We propose a model selection technique to choose the best neural network weights for use once the training phase is complete. We employ three different model selection methods with the proposed DQN framework in order to show a thorough comparison.

  • We empirically validate the proposed approach using a single-zone building simulation platform with a local PI controller regulating the zone temperature. Two scenarios, one with demand charges and one without, are investigated accordingly. In both scenarios, the temperature regulation will display the fact that pre-cooling is performed in the early morning when electricity is cheaper, thereby reducing expensive day-time consumption and lowering peak demand.

  • We perform a practical evaluation of the strategies by including the cost of random exploration in performance metrics. Most other works examining RL assume that a very large amount of past historical data is freely available for training. However, in real buildings, electricity bills cannot be put on hold while exploratory data is collected, and thus these costs cannot be neglected.

Outline. The rest of the paper is outlined as follows. Section 2 presents related works on the applications of MPC and DRL to the building HVAC control as well as introduces time-varying cost and demand charge. While Section 3 provides an overview of POMDP and DQN, in Section 4 we describe the system in detail, consisting of the dynamics, cost function, and the rule-based policies. The proposed approaches and techniques are presented in Section 5. The simulation experiments including results and discussion are given in Section 6. Finally, concluding remarks and possible directions for future research are summarized in Section 7.

Section snippets

Related work and novelty

In this section, we review the recent related works by using MPC and DRL approaches. In addition to that, we also discuss the novelty and significance of our work.

Preliminaries

We provide the background information in this section before the proposed method is introduced in the following sections. We first present the POMDP and then revisit the Deep Q-Network (DQN).

System description

Before discussing our proposed approach, we describe our system of interest, including relevant dynamics, cost structure, action/observation spaces, and rule-based policies.

Proposed approach

While DRL is a very powerful general paradigm to learn to look ahead and try to maximize future rewards, the current training algorithms and techniques are often in practice constrained by lack of availability of large amount of training samples (that span across different states and actions), imperfect reward assignment schemes (assigning suboptimal rewards and penalties for actions taken in different states) and the length of the reward horizon (the longer the horizon, less effect a future

Simulation testbed

To test our proposed approach on the system described in Section 4, we make use of a simplified zone simulator developed for this and other work. Internally, the simulator models the zone using a second-order linear model, with one state for the (uniform) air temperature in the zone and a second state for the (uniform) temperature of the solid mass surrounding the zone. Both states receive direct head loads from solar radiation, plug loads, and internal occupants. In addition, they exchange

Conclusions and future work

In this paper, we have investigated using reinforcement learning to the problem of obtaining energy cost savings within buildings in the presence of time-varying electricity price profile and demand charge. This problem continues to be a challenging one in building HVAC systems. We have proposed improvements to a traditional DQN that include an action processor, model selection and reward shaping techniques for dealing with limited training data and the issue of sparse reward caused by the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (47)

  • G.P. Henze et al.

    Evaluation of optimal control for active and passive building thermal storage

    Int. J. Therm. Sci.

    (2004)
  • J. Ma et al.

    Demand reduction in building energy systems based on economic model predictive control

    Chem. Eng. Sci.

    (2012)
  • F. Smarra et al.

    Data-driven model predictive control using random forests for building energy optimization and climate control

    Appl. Energy

    (2018)
  • S. Liu et al.

    Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory: Part 1. Theoretical foundation

    Energy Build.

    (2006)
  • S. Liu et al.

    Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory: Part 2: Results and analysis

    Energy Build.

    (2006)
  • Z. Wang et al.

    Reinforcement learning for building controls: The opportunities and challenges

    Appl. Energy

    (2020)
  • U. EIA, Annual Energy Review 2018, Energy Information...
  • Yang, Y., Hu, G., & Spanos, C. J. (2020). HVAC Energy Cost Optimization for a Multizone Building via a Decentralized...
  • V.L. Erickson et al.

    Occupancy based demand response HVAC control strategy

  • R. Eini et al.

    Learning-based model predictive control for smart building thermal management

  • Y. Wang et al.

    Fast model predictive control using online optimization

    IEEE Trans. Control Syst. Technol.

    (2009)
  • O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A.S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J....
  • S. Gu et al.

    Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

  • Cited by (0)

    View full text