当前位置: X-MOL 学术Energy Build. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Monitoring data-driven Reinforcement Learning controller training: A comparative study of different training strategies for a real-world energy system
Energy and Buildings ( IF 6.6 ) Pub Date : 2021-03-02 , DOI: 10.1016/j.enbuild.2021.110856
Thomas Schreiber , Christoph Netsch , Marc Baranski , Dirk Müller

With increasing complexity of building energy systems and rising shares of renewable energies in the grids, the requirements for building automation and control systems (BACS) are increasing. The use of storage systems enables the decoupling of energy demand and supply and to consider dynamic constraints in the control of the systems. The resulting optimization problem is very challenging to solve with the state-of-the-art rule-based-control (RBC) approach. Model Predictive Control (MPC) on the other hand allows a nearly optimal operation but comes with expensive modeling efforts and high computational costs. These drawbacks are contrasted by promising results from the field of Reinforcement Learning (RL). RL can be model-free, is highly adaptive and learns a policy by interacting with the controlled system. However, the literature also addresses a number of questions, to be answered before RL for BACS can be realized. One is the slow convergence of the training process, which makes the application of a pre-training strategy necessary. Therefore, we design and compare different pre-training work-flows for a real-world energy system, in a demand response scenario. We apply a data-driven approach, covering all aspects from raw monitoring data to the trained algorithm. The considered energy system consists of two compression chillers and an ice storage. The objective of the control task is to charge and discharge the storage with respect to dynamic constraints. We use machine learning models of the energy system to train and evaluate a state-of-the-art RL algorithm (DQN) under five different pre-training strategies. We compare, online and offline training and initialization of the RL controller together with a guiding RBC. We demonstrate that offline training with a guiding RBC provides stable learning and a RL controller that always outperforms this guiding RBC. Unguided exploration on the other hand leads to higher accumulated cost savings. Based on our findings, we derive recommendations for practical application and future research questions.



中文翻译:

监控数据驱动的强化学习控制器培训:实际能源系统不同培训策略的比较研究

随着建筑能源系统的复杂性增加以及电网中可再生能源份额的增加,对建筑自动化和控制系统(BACS)的要求也越来越高。使用存储系统可以使能源需求和供应脱钩,并可以在系统控制中考虑动态约束。使用最新的基于规则的控制(RBC)方法解决所产生的优化问题非常具有挑战性。另一方面,模型预测控制(MPC)可以实现近乎最佳的操作,但需要付出昂贵的建模工作和高昂的计算成本。强化学习(RL)领域的有希望的结果与这些缺点形成了鲜明的对比。RL可以是无模型的,高度自适应的,并且可以通过与受控系统进行交互来学习策略。然而,文献还提出了许多问题,在实现BACS的RL之前,需要先回答这些问题。一种是训练过程的缓慢收敛,这使得必须应用预训练策略。因此,在需求响应方案中,我们针对现实世界的能源系统设计并比较了不同的预培训工作流程。我们采用了一种数据驱动的方法,涵盖了从原始监视数据到经过训练的算法的所有方面。所考虑的能源系统包括两个压缩式制冷机和一个冰库。控制任务的目的是根据动态约束对存储进行充电和放电。我们使用能源系统的机器学习模型在五种不同的预训练策略下训练和评估最新的RL算法(DQN)。我们比较一下 在线和离线培训以及RL控制器的初始化以及指导性RBC。我们证明,具有指导性RBC的离线培训可提供稳定的学习效果,并且RL控制器始终优于该指导性RBC。另一方面,无指导的勘探可以节省更多的累积成本。根据我们的发现,我们为实际应用和未来的研究问题提供了建议。

更新日期:2021-03-15
down
wechat
bug