Learning a Decentralized Multi-arm Motion Planner,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning a Decentralized Multi-arm Motion Planner
arXiv - CS - Multiagent Systems Pub Date : 2020-11-05 , DOI: arxiv-2011.02608
Huy Ha, Jingxi Xu, Shuran Song

We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robot systems have relied on centralized motion planners, whose runtimes often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multi-arm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets. Code and data links can be found at https://multiarm.cs.columbia.edu.

中文翻译：

学习一个分散的多臂运动规划器

我们提出了一个闭环多臂运动规划器，它具有可扩展性和灵活的团队规模。传统的多臂机器人系统依赖于集中式运动规划器，其运行时间通常随团队规模呈指数增长，因此无法通过开环控制处理动态环境。在本文中，我们通过多智能体强化学习来解决这个问题，其中训练分散的策略来控制多臂系统中的一个机器人手臂，以在观察其工作空间状态和目标末端的情况下达到其目标末端执行器姿势。效应器姿势。该策略使用 Soft Actor-Critic 进行训练，并通过基于采样的运动规划算法（即 BiRRT）的专家演示进行训练。通过利用经典的规划算法，我们可以在保留神经网络快速推理时间的同时，提高强化学习算法的学习效率。由此产生的策略以次线性方式扩展，并且可以部署在具有可变团队规模的多臂系统上。由于闭环和分散的公式，我们的方法可以推广到 5-10 个多臂系统和动态移动目标（10 臂系统的成功率超过 90%），尽管只接受了 1-4 臂规划任务的训练与静态目标。代码和数据链接可以在 https://multiarm.cs.columbia.edu 上找到。我们的方法可以推广到 5-10 个多臂系统和动态移动目标（10 臂系统的成功率超过 90%），尽管只接受了 1-4 个带有静态目标的臂规划任务的训练。代码和数据链接可以在 https://multiarm.cs.columbia.edu 上找到。我们的方法可以推广到 5-10 个多臂系统和动态移动目标（10 臂系统的成功率超过 90%），尽管只接受了 1-4 个带有静态目标的臂规划任务的训练。代码和数据链接可以在 https://multiarm.cs.columbia.edu 上找到。

更新日期：2020-11-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文