当前位置: X-MOL 学术IEEE Trans. Sustain. Energy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting the Flexibility Inside Park-Level Commercial Buildings Considering Heat Transfer Time Delay: A Memory-Augmented Deep Reinforcement Learning Approach
IEEE Transactions on Sustainable Energy ( IF 8.6 ) Pub Date : 2021-08-30 , DOI: 10.1109/tste.2021.3107439
Haotian Zhao , Bin Wang , Haotian Liu , Hongbin Sun , Zhaoguang Pan , Qinglai Guo

The energy consumed by commercial buildings for heating and cooling is significantly increased. To better cope with the uncertainty introduced by the high penetration of renewable generation units, exploiting the potential flexibility of commercial buildings is an effective solution. The parameters of commercial buildings are not onsite identified; hence, deep reinforcement learning (DRL) methods are integrated into the energy management problem. However, time delay commonly occurs in district heating systems due to the heat transfer process, which causes asynchronization between the state and the action and poses challenges to conventional DRL algorithms. In this paper, the asynchronization between the state and the action is eliminated by expending the state space to a larger but partially observed space, and the dispatch model for the district heating system is formulated as a partially observable Markov decision process (POMDP). Based on the finite difference method, this paper proposes a novel memory-augmented (MA) DRL method and utilizes dueling network structure to cope with the time delay caused by heat transfer process. The selection of the memory size is mathematically derived under a certain accuracy. Results from a case study of an industrial park demonstrate the satisfactory performance of the proposed method.

中文翻译:


考虑传热时间延迟的公园级商业建筑内的灵活性:一种记忆增强深度强化学习方法



商业建筑供暖和制冷的能耗显着增加。为了更好地应对可再生能源发电机组高渗透率带来的不确定性,挖掘商业建筑潜在的灵活性是一个有效的解决方案。商业建筑参数未现场识别;因此,深度强化学习(DRL)方法被集成到能源管理问题中。然而,区域供热系统中由于传热过程普遍存在时滞,导致状态与动作不同步,对传统的DRL算法提出了挑战。本文通过将状态空间扩展到更大但部分可观测的空间来消除状态与动作之间的异步性,并将区域供热系统的调度模型制定为部分可观测马尔可夫决策过程(POMDP)。基于有限差分法,本文提出了一种新颖的记忆增强(MA)DRL方法,并利用决斗网络结构来应对传热过程引起的时间延迟。存储器大小的选择是在一定精度下通过数学推导出来的。工业园区案例研究的结果表明该方法具有令人满意的性能。
更新日期:2021-08-30
down
wechat
bug