当前位置: X-MOL 学术J. Process Control › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online reinforcement learning for a continuous space system with experimental validation
Journal of Process Control ( IF 4.2 ) Pub Date : 2021-06-27 , DOI: 10.1016/j.jprocont.2021.06.004
Oguzhan Dogru , Nathan Wieczorek , Kirubakaran Velswamy , Fadi Ibrahim , Biao Huang

Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several critical strategies for practical implementation of RL are developed, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies. A successful real-time implementation of RL is reported. The first step is a meta-heuristic first principles model parameter optimization, where a custom pseudo random binary signal (PRBS) is used to obtain open-loop experimental data. This is followed by in silico asynchronous advantage actor–critic (A3C/A-A2C) based policy learning. In the second step, three different approaches (namely proximal learning, single trajectory learning, and multiple trajectory learning) are utilized to explore the state/action space. In the final step, online learning (A2C) using the best in silico policy on the real process using a socket connection is established. The extent of exploration (EoE, a measure of exploration) is proposed as a parameter for quantifying exploration of the state/action space. While the online sample efficiency of RL application is enhanced, a soft constraint based constrained learning is proposed and validated. With considerations of the proposed strategies, this work demonstrates the possibility of applying RL to solve practical control problems.



中文翻译:

具有实验验证的连续空间系统的在线强化学习

即使在模拟级别,连续状态/动作空间系统的强化学习 (RL) 仍然是非线性多元动力系统的挑战。实施这样的实时控制方案仍然很困难,而且在很大程度上仍未得到解决。在这项研究中,开发了一些实际实施 RL 的关键策略,并利用多变量、多模式、混合三罐 (HTT) 物理过程来说明所提出的策略。报告了 RL 的成功实时实施。第一步是元启发式第一原理模型参数优化,其中使用自定义伪随机二进制信号 (PRBS) 来获取开环实验数据。其次是在 silico基于异步优势参与者-评论家 (A3C/A-A2C) 的策略学习。第二步,利用三种不同的方法(即近端学习、单轨迹学习和多轨迹学习)来探索状态/动作空间。在最后一步,在线学习 (A2C)使用套接字连接的真实过程中使用最佳计算机策略建立。探索程度(EoE,探索的度量)被提议作为量化状态/动作空间探索的参数。在提高 RL 应用的在线样本效率的同时,提出并验证了基于软约束的约束学习。考虑到所提出的策略,这项工作证明了应用 RL 解决实际控制问题的可能性。

更新日期:2021-06-28
down
wechat
bug