Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-22 , DOI: arxiv-2011.10897
Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.

中文翻译：

通过基于距离的激励/惩罚（DIP）更新进行强化学习，用于高度受限的工业控制系统

典型的强化学习（RL）方法在现实世界中的工业控制问题中显示出有限的适用性，因为工业系统涉及各种约束，并且同时需要连续和离散的控制。为了克服这些挑战，我们设计了一种新颖的RL算法，使代理能够处理高度受限的动作空间。该算法具有两个主要特征。首先，在基于距离的激励/惩罚更新技术中，我们设计了两种基于距离的Q值更新方案：激励更新和惩罚更新，以使代理能够确定可行区域中的离散和连续动作并更新这些类型的动作。其次，我们提出了一种将惩罚成本定义为影子价格加权惩罚的方法。与先前的方法相比，该方法具有两个优点，可以有效地诱导该代理选择不可行的动作。我们将算法应用于工业控制问题，微电网系统运行，实验结果证明了其优越性。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文