当前位置:
X-MOL 学术
›
arXiv.cs.RO
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Where to go next: Learning a Subgoal Recommendation Policy for Navigation Among Pedestrians
arXiv - CS - Robotics Pub Date : 2021-02-25 , DOI: arxiv-2102.13073 Bruno Brito, Michael Everett, Jonathan P. How, Javier Alonso-Mora
arXiv - CS - Robotics Pub Date : 2021-02-25 , DOI: arxiv-2102.13073 Bruno Brito, Michael Everett, Jonathan P. How, Javier Alonso-Mora
Robotic navigation in environments shared with other robots or humans remains
challenging because the intentions of the surrounding agents are not directly
observable and the environment conditions are continuously changing. Local
trajectory optimization methods, such as model predictive control (MPC), can
deal with those changes but require global guidance, which is not trivial to
obtain in crowded scenarios. This paper proposes to learn, via deep
Reinforcement Learning (RL), an interaction-aware policy that provides
long-term guidance to the local planner. In particular, in simulations with
cooperative and non-cooperative agents, we train a deep network to recommend a
subgoal for the MPC planner. The recommended subgoal is expected to help the
robot in making progress towards its goal and accounts for the expected
interaction with other agents. Based on the recommended subgoal, the MPC
planner then optimizes the inputs for the robot satisfying its kinodynamic and
collision avoidance constraints. Our approach is shown to substantially improve
the navigation performance in terms of number of collisions as compared to
prior MPC frameworks, and in terms of both travel time and number of collisions
compared to deep RL methods in cooperative, competitive and mixed multiagent
scenarios.
中文翻译:
下一步:学习行人导航的次目标建议政策
在与其他机器人或人类共享的环境中,机器人导航仍然具有挑战性,因为无法直接观察到周围人员的意图,并且环境条件也在不断变化。局部轨迹优化方法(例如模型预测控制(MPC))可以应对这些变化,但需要全局指导,这在拥挤的场景中并非易事。本文建议通过深度强化学习(RL)学习一种交互意识的策略,该策略可为本地规划人员提供长期指导。特别是在与合作社和非合作社代理商进行的模拟中,我们训练了一个深层网络,为MPC计划者推荐了一个子目标。推荐的子目标有望帮助机器人朝其目标迈进,并说明与其他特工的预期交互作用。然后,根据建议的子目标,MPC规划器会优化机器人的输入,以满足其运动学和避免碰撞的约束。与以前的MPC框架相比,我们的方法在碰撞次数方面以及与深层RL方法相比在协作,竞争和混合多主体方案中的行程时间和碰撞次数方面均显着提高了导航性能。
更新日期:2021-02-26
中文翻译:
下一步:学习行人导航的次目标建议政策
在与其他机器人或人类共享的环境中,机器人导航仍然具有挑战性,因为无法直接观察到周围人员的意图,并且环境条件也在不断变化。局部轨迹优化方法(例如模型预测控制(MPC))可以应对这些变化,但需要全局指导,这在拥挤的场景中并非易事。本文建议通过深度强化学习(RL)学习一种交互意识的策略,该策略可为本地规划人员提供长期指导。特别是在与合作社和非合作社代理商进行的模拟中,我们训练了一个深层网络,为MPC计划者推荐了一个子目标。推荐的子目标有望帮助机器人朝其目标迈进,并说明与其他特工的预期交互作用。然后,根据建议的子目标,MPC规划器会优化机器人的输入,以满足其运动学和避免碰撞的约束。与以前的MPC框架相比,我们的方法在碰撞次数方面以及与深层RL方法相比在协作,竞争和混合多主体方案中的行程时间和碰撞次数方面均显着提高了导航性能。