当前位置: X-MOL 学术Rob. Auton. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning image-based Receding Horizon Planning for manipulation in clutter
Robotics and Autonomous Systems ( IF 4.3 ) Pub Date : 2021-01-23 , DOI: 10.1016/j.robot.2021.103730
Wissam Bejjani , Matteo Leonetti , Mehmet R. Dogar

The manipulation of an object into a desired location in a cluttered and restricted environment requires reasoning over the long-term consequences of an action while reacting locally to the multiple physics-based interactions. We present Visual Receding Horizon Planning (VisualRHP) in a framework which interleaves real-world execution with look-ahead planning to efficiently solve a short-horizon approximation to a multi-step sequential decision making problem. VisualRHP is guided by a learned heuristic that acts on an abstract colour-labelled image-based representation of the state. With this representation, the robot can generalize its behaviours to different environment setups, that is, different number and shape of objects, while also having transferable manipulation skills that can be applied to a multitude of real-world objects. We train the heuristic with imitation and reinforcement learning in discrete and continuous actions spaces. We detail our heuristic learning process for environments with sparse rewards, and non-linear, non-continuous, dynamics. In particular, we introduce necessary changes for improving the stability of existing reinforcement learning algorithms that use neural networks with shared parameters. In a series of simulation and real-world experiments, we show the robot performing prehensile and non-prehensile actions in synergy to successfully manipulate a variety of real-world objects in real-time.



中文翻译:

学习基于图像的后退视野规划以在混乱中进行操作

在混乱和受限的环境中将对象操纵到所需位置时,需要对动作的长期后果进行推理,同时对多种基于物理学的相互作用做出局部反应。我们在一个框架中提出了视觉后备水平规划(VisualRHP),该框架将现实世界的执行与前瞻性规划交织在一起,以有效地解决多步顺序决策问题的短时近似问题。VisualRHP由学习的启发式方法指导,该方法作用于状态的基于抽象颜色标记的基于图像的表示形式。通过这种表示,机器人可以将其行为概括为不同的环境设置,即不同数量和形状的对象,同时还具有可应用于多种现实世界对象的可转移操纵技能。我们在离散和连续的动作空间中通过模仿和强化学习来训练启发式方法。我们详细介绍了奖励稀少,非线性,非连续,动态的环境的启发式学习过程。特别是,我们引入了必要的更改,以提高现有的使用带有共享参数的神经网络的强化学习算法的稳定性。在一系列的模拟和现实世界实验中,我们展示了机器人协同执行预感和非预感动作,以成功地实时成功地操纵各种现实对象。我们介绍了一些必要的更改,以提高使用基于神经网络共享参数的现有强化学习算法的稳定性。在一系列的模拟和现实世界实验中,我们展示了机器人协同执行预感和非预感动作,以成功地实时成功地操纵各种现实对象。我们介绍了一些必要的更改,以提高使用基于神经网络共享参数的现有强化学习算法的稳定性。在一系列的模拟和现实世界实验中,我们展示了机器人协同执行预感和非预感动作,以成功地实时成功地操纵各种现实对象。

更新日期:2021-02-03
down
wechat
bug