Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning
arXiv - CS - Machine Learning Pub Date : 2020-01-18 , DOI: arxiv-2001.06627
Samaneh Hosseini Semnani, Hugh Liu, Michael Everett, Anton de Ruiter, Jonathan P. How

This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP.

中文翻译：

通过深度强化学习为密集和动态环境进行多智能体运动规划

本文介绍了深度强化学习 (RL) 和基于力的运动规划 (FMP) 的混合算法，以解决密集和动态环境中的分布式运动规划问题。RL 和 FMP 算法各自都有自己的局限性。FMP 无法生成时间最优路径，现有的 RL 解决方案无法在密集环境中生成无碰撞路径。因此，我们首先尝试通过引入新的奖励函数来提高最近 RL 方法的性能，该函数不仅消除了预监督学习 (SL) 步骤的要求，而且还减少了在拥挤环境中发生碰撞的机会。这使事情有所改善，但仍然有很多失败案例。因此，我们开发了一种混合方法，在卡住、简单和高风险的情况下利用更简单的 FMP 方法，并在 FMP 无法产生最佳路径的正常情况下继续使用 RL。此外，我们将 GA3C-CADRL 算法扩展到 3D 环境。仿真结果表明，所提出的算法优于深度强化学习和 FMP 算法，并且产生的成功场景比深度强化学习多 50%，达到目标的额外时间比 FMP 少 75%。

更新日期：2020-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>