Online reinforcement learning for a continuous space system with experimental validation,Journal of Process Control

当前位置： X-MOL 学术 › J. Process Control › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online reinforcement learning for a continuous space system with experimental validation
Journal of Process Control ( IF 4.2 ) Pub Date : 2021-06-27 , DOI: 10.1016/j.jprocont.2021.06.004
Oguzhan Dogru , Nathan Wieczorek , Kirubakaran Velswamy , Fadi Ibrahim , Biao Huang

Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several critical strategies for practical implementation of RL are developed, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies. A successful real-time implementation of RL is reported. The first step is a meta-heuristic first principles model parameter optimization, where a custom pseudo random binary signal (PRBS) is used to obtain open-loop experimental data. This is followed by in silico asynchronous advantage actor–critic (A3C/A-A2C) based policy learning. In the second step, three different approaches (namely proximal learning, single trajectory learning, and multiple trajectory learning) are utilized to explore the state/action space. In the final step, online learning (A2C) using the best in silico policy on the real process using a socket connection is established. The extent of exploration (EoE, a measure of exploration) is proposed as a parameter for quantifying exploration of the state/action space. While the online sample efficiency of RL application is enhanced, a soft constraint based constrained learning is proposed and validated. With considerations of the proposed strategies, this work demonstrates the possibility of applying RL to solve practical control problems.

中文翻译：

具有实验验证的连续空间系统的在线强化学习

即使在模拟级别，连续状态/动作空间系统的强化学习 (RL) 仍然是非线性多元动力系统的挑战。实施这样的实时控制方案仍然很困难，而且在很大程度上仍未得到解决。在这项研究中，开发了一些实际实施 RL 的关键策略，并利用多变量、多模式、混合三罐 (HTT) 物理过程来说明所提出的策略。报告了 RL 的成功实时实施。第一步是元启发式第一原理模型参数优化，其中使用自定义伪随机二进制信号 (PRBS) 来获取开环实验数据。其次是在 silico基于异步优势参与者-评论家 (A3C/A-A2C) 的策略学习。第二步，利用三种不同的方法（即近端学习、单轨迹学习和多轨迹学习）来探索状态/动作空间。在最后一步，在线学习 (A2C)在使用套接字连接的真实过程中使用最佳计算机策略建立。探索程度（EoE，探索的度量）被提议作为量化状态/动作空间探索的参数。在提高 RL 应用的在线样本效率的同时，提出并验证了基于软约束的约束学习。考虑到所提出的策略，这项工作证明了应用 RL 解决实际控制问题的可能性。

更新日期：2021-06-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>