Where to go next: Learning a Subgoal Recommendation Policy for Navigation Among Pedestrians,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Where to go next: Learning a Subgoal Recommendation Policy for Navigation Among Pedestrians
arXiv - CS - Robotics Pub Date : 2021-02-25 , DOI: arxiv-2102.13073
Bruno Brito, Michael Everett, Jonathan P. How, Javier Alonso-Mora

Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios. This paper proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner. The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kinodynamic and collision avoidance constraints. Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios.

中文翻译：

下一步：学习行人导航的次目标建议政策

在与其他机器人或人类共享的环境中，机器人导航仍然具有挑战性，因为无法直接观察到周围人员的意图，并且环境条件也在不断变化。局部轨迹优化方法（例如模型预测控制（MPC））可以应对这些变化，但需要全局指导，这在拥挤的场景中并非易事。本文建议通过深度强化学习（RL）学习一种交互意识的策略，该策略可为本地规划人员提供长期指导。特别是在与合作社和非合作社代理商进行的模拟中，我们训练了一个深层网络，为MPC计划者推荐了一个子目标。推荐的子目标有望帮助机器人朝其目标迈进，并说明与其他特工的预期交互作用。然后，根据建议的子目标，MPC规划器会优化机器人的输入，以满足其运动学和避免碰撞的约束。与以前的MPC框架相比，我们的方法在碰撞次数方面以及与深层RL方法相比在协作，竞争和混合多主体方案中的行程时间和碰撞次数方面均显着提高了导航性能。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>