当前位置: X-MOL 学术Sensors › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Reward Function with Matching Network for Mapless Navigation.
Sensors ( IF 3.4 ) Pub Date : 2020-06-30 , DOI: 10.3390/s20133664
Qichen Zhang 1, 2 , Meiqiang Zhu 1, 2 , Liang Zou 1, 2 , Ming Li 1, 2 , Yong Zhang 1, 2
Affiliation  

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

中文翻译:

通过匹配网络学习奖励功能,实现无地图导航。

深度强化学习(DRL)已成功应用于无地图导航。DRL中的一个重要问题是设计用于评估代理行为的奖励功能。但是,设计健壮且合适的奖励功能在很大程度上取决于设计者的经验和直觉。为了解决这个问题,我们考虑在没有人工监督的情况下,对类似导航任务的轨迹采用奖励整形,并提出基于匹配网络(MN)的通用奖励函数。基于MN的奖励功能能够通过对不同导航任务的轨迹进行预训练来获得经验,并加快DRL在新任务中的训练速度。提出的奖励函数使DRL的最佳策略保持不变。在两个静态地图上的仿真结果表明,与最新的无地图导航方法相比,DRL通过学习的奖励函数收敛的迭代次数更少。所提出的方法在具有部分移动障碍物的动态地图中表现良好。即使测试图与训练图不同,所提出的策略也能够在无需额外训练的情况下完成导航任务。
更新日期:2020-06-30
down
wechat
bug