Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,The International Journal of Robotics Research

当前位置： X-MOL 学术 › Int. J. Robot. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios
The International Journal of Robotics Research ( IF 7.5 ) Pub Date : 2020-05-31 , DOI: 10.1177/0278364920916531
Tingxiang Fan ₁ , Pinxin Long ₂ , Wenxi Liu ₃ , Jia Pan ₁

Affiliation

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca.

中文翻译：

复杂场景下通过深度强化学习的分布式多机器人避碰导航

在分散的场景中，为多个机器人开发安全有效的避免碰撞策略具有挑战性，其中每个机器人在有限观察其他机器人的状态和意图的情况下生成其路径。先前的分布式多机器人防撞系统通常需要频繁的机器人间通信或代理级特征来规划局部无碰撞动作，这既不稳健又在计算上令人望而却步。此外，这些方法的性能在实践中无法与中心化的同类方法相提并论。在本文中，我们提出了一种用于多机器人系统的分散式传感器级防碰撞策略，该策略在实际应用中显示出有希望的结果。特别是，我们的策略根据移动速度直接将原始传感器测量值映射到代理的转向命令。作为减少分散式和集中式方法之间性能差距的第一步，我们提出了一个多场景多阶段训练框架来学习最佳策略。该策略使用基于策略梯度的强化学习算法在丰富、复杂的环境中同时对大量机器人进行训练。学习算法也被集成到混合控制框架中，以进一步提高策略的鲁棒性和有效性。我们在各种模拟和现实场景中验证了学习到的传感器级碰撞 3 避免策略，并对大规模多机器人系统进行了全面的性能评估。学习策略的泛化在一组看不见的场景中得到验证，包括一组异构机器人的导航和一个包含 100 个机器人的大规模场景。尽管该策略仅使用模拟数据进行训练，但我们已成功将其部署在具有与模拟代理不同的形状和动力学特征的物理机器人上，以证明控制器对模拟到真实建模错误的鲁棒性。最后，我们展示了从多机器人导航任务中学习的避碰策略为在密集的真实人群中工作的单个机器人提供了安全有效的自主导航的绝佳解决方案。我们学习到的策略使机器人能够在人群中取得有效进展而不会被卡住。更重要的是，该策略已成功部署在不同类型的物理机器人平台上，无需繁琐的参数调整。视频可在 https://sites.google.com/view/hybridmrca 获得。

更新日期：2020-05-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文