Using process data to generate an optimal control policy via apprenticeship and reinforcement learning,AIChE Journal

当前位置： X-MOL 学术 › AlChE J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning
AIChE Journal ( IF 3.5 ) Pub Date : 2021-05-07 , DOI: 10.1002/aic.17306
Max R. Mowbray ₁ , Robin Smith ₁ , Ehecatl A. Del Rio‐Chanona ₂ , Dongda Zhang ₁

Affiliation

Reinforcement learning (RL) is a data-driven approach to synthesizing an optimal control policy. A barrier to wide implementation of RL-based controllers is its data-hungry nature during online training and its inability to extract useful information from human operator and historical process operation data. Here, we present a two-step framework to resolve this challenge. First, we employ apprenticeship learning via inverse RL to analyze historical process data for synchronous identification of a reward function and parameterization of the control policy. This is conducted offline. Second, the parameterization is improved online efficiently under the ongoing process via RL within only a few iterations. Significant advantages of this framework include to allow for the hot-start of RL algorithms for process optimal control, and robust abstraction of existing controllers and control knowledge from data. The framework is demonstrated on three case studies, showing its potential for chemical process control.

中文翻译：

通过学徒制和强化学习，使用过程数据生成最佳控制策略

强化学习 (RL) 是一种合成最优控制策略的数据驱动方法。广泛实施基于 RL 的控制器的一个障碍是其在线培训期间的数据饥渴性质，以及无法从人类操作员和历史过程操作数据中提取有用信息。在这里，我们提出了一个两步框架来解决这一挑战。首先，我们通过逆 RL 采用学徒学习来分析历史过程数据，以同步识别奖励函数和控制策略的参数化。这是离线进行的。其次，参数化在正在进行的过程中通过 RL 仅在几次迭代中就得到了有效的在线改进。该框架的显着优势包括允许热启动 RL 算法以进行过程优化控制，以及从数据中提取现有控制器和控制知识的强大抽象。该框架在三个案例研究中得到了证明，显示了其在化学过程控制方面的潜力。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11