当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 6-14-2022 , DOI: 10.1109/tcyb.2022.3178084
Qingang Zhang 1 , Muhammad Haiqal Bin Mahbod 1 , Chin-Boon Chng 1 , Poh-Seng Lee 1 , Chee-Kong Chui 1
Affiliation  

Deep reinforcement learning (DRL) has been researched for computer room air conditioning unit control problems in data centers (DCs). However, two main issues limit the deployment of DRL in actual systems. First, a large amount of data is needed. Next, as a mission-critical system, safe control needs to be guaranteed, and temperatures in DCs should be kept within a certain operating range. To mitigate these issues, this article proposes a novel control method RP-SDRL. First, Residual Physics, built using the first law of thermodynamics, is integrated with the DRL algorithm and a Prediction Model. Subsequently, a Correction Model adapted from gradient descent is combined with the Prediction Model as Post-Posed Shielding to enforce safe actions. The RP-SDRL method was validated using simulation. Noise is added to the states of the model to further test its performance under state uncertainty. Experimental results show that the combination of Residual Physics and DRL can significantly improve the initial policy, sample efficiency, and robustness. Residual Physics can also improve the sample efficiency and the accuracy of the prediction model. While DRL alone cannot avoid constraint violations, RP-SDRL can detect unsafe actions and significantly reduce violations. Compared to the baseline controller, about 13% of electricity usage can be saved.

中文翻译:


用于安全深度强化学习方法的残差物理和后置屏蔽



深度强化学习(DRL)已针对数据中心(DC)中的机房空调机组控制问题进行了研究。然而,有两个主要问题限制了 DRL 在实际系统中的部署。首先,需要大量的数据。其次,作为关键任务系统,需要保证安全控制,并且数据中心的温度应保持在一定的运行范围内。为了缓解这些问题,本文提出了一种新颖的控制方法 RP-SDRL。首先,残差物理利用热力学第一定律构建,与 DRL 算法和预测模型集成。随后,根据梯度下降改编的校正模型与预测模型相结合,作为后置屏蔽以强制执行安全行动。 RP-SDRL 方法通过仿真进行了验证。将噪声添加到模型的状态中,以进一步测试其在状态不确定性下的性能。实验结果表明,残差物理与DRL的结合可以显着提高初始策略、样本效率和鲁棒性。残差物理还可以提高样本效率和预测模型的准确性。虽然仅 DRL 无法避免约束违规,但 RP-SDRL 可以检测不安全行为并显着减少违规。与基线控制器相比,可节省约13%的用电量。
更新日期:2024-08-22
down
wechat
bug