Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient.,Frontiers in Neurorobotics

当前位置： X-MOL 学术 › Front. Neurorobotics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient.
Frontiers in Neurorobotics ( IF 2.6 ) Pub Date : 2020-03-27 , DOI: 10.3389/fnbot.2020.00021
Junjie Cao ₁ , Weiwei Liu ₁ , Yong Liu ₁ , Jian Yang ₂

Affiliation

There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.

中文翻译：

从演示到具有演变策略梯度的各种方案，对机器人学习进行概括。

机器人自动化的研究已经有了长足的发展，其目的是使机器人能够直接与世界或人类互动。在这种情况下，通过人类演示自动化进行机器人学习至关重要。但是，演示的依赖性将机器人限制在固定的场景，而无法在变体情况下探索以完成与演示相同的任务。深度强化学习方法可能是使机器人学习超越人类演示并在未知情况下完成任务的好方法。探索是这种泛化到不同环境的核心。虽然在强化学习中的探索可能是无效的，并且存在样本效率低下的问题。在本文中，我们提出了进化策略梯度（EPG），以使机器人从演示中学习并有效地执行面向目标的探索。通过面向目标的探索，我们的方法可以将机器人学习的技能推广到具有不同参数的环境中。我们的进化策略梯度在进化算法（EA）的框架中将参数摄动与策略梯度方法结合在一起，可以融合两者的优势，从而实现有效而高效的探索。通过演示指导进化过程，机器人可以加速面向目标的探索，以将其能力推广到各种场景。在OpenAI Gym的机器人控制任务中进行的实验获得了丰厚而稀疏的奖励，这表明我们的EPG能够提供超过原始策略梯度方法和EA的竞争性能。

更新日期：2020-03-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11