当前位置: X-MOL 学术Isa Trans. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hybrid MPC for constrained deep reinforcement learning applied for planar robotic arm
ISA Transactions ( IF 7.3 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.isatra.2021.03.046
Mostafa Al-Gabalawy

Recently, deep reinforcement learning techniques have achieved tangible results for learning high dimensional control tasks. Due to the trial and error interaction, between the autonomous agent and the environment, the learning phase is unconstrained and limited to the simulator. Such exploration has an additional drawback of consuming unnecessary samples at the beginning of the learning process. Model-based algorithms, on the other hand, handle this issue by learning the dynamics of the environment. However, model-free algorithms have a higher asymptotic performance than model-based ones. The main contribution of this paper is to construct a hybrid structured algorithm from model predictive control (MPC) and deep reinforcement learning (DRL) (MPC-DRL), that makes use of the benefits of both methods, to satisfy constraint conditions throughout the learning process. The validity of the proposed approach is demonstrated by learning a reachability task. The results show complete satisfaction for the constraint condition, represented by a static obstacle, with a smaller number of samples and higher performance compared to state-of-the-art model-free algorithms.



中文翻译:

混合MPC用于约束深度强化学习在平面机械臂中的应用。

近来,深度强化学习技术已经在学习高维控制任务方面取得了明显的成果。由于自主代理人与环境之间的反复试验相互作用,学习阶段不受限制,并且仅限于模拟器。这种探索的另一个缺点是在学习过程的开始就消耗了不必要的样本。另一方面,基于模型的算法通过学习环境动态来处理此问题。但是,无模型算法比基于模型的算法具有更高的渐近性能。本文的主要贡献是利用模型预测控制(MPC)和深度强化学习(DRL)(MPC-DRL)构造了一种混合结构化算法,并利用了这两种方法的优势,在整个学习过程中满足约束条件。通过学习可达性任务证明了该方法的有效性。结果表明,与最新的无模型算法相比,以静态障碍物表示的约束条件完全满意,并且样本数量更少,性能更高。

更新日期:2021-04-01
down
wechat
bug