eep Reinforcement Learning for Indoor Mobile Robot Path Planning,Sensors

当前位置： X-MOL 学术 › Sensors › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

eep Reinforcement Learning for Indoor Mobile Robot Path Planning
Sensors ( IF 3.9 ) Pub Date : 2020-09-25 , DOI: 10.3390/s20195493
Junli Gao , Weijie Ye , Jing Guo , Zhongjuan Li

This paper proposes a novel incremental training mode to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. Firstly, we evaluate the related graphic search algorithms and Reinforcement Learning (RL) algorithms in a lightweight 2D environment. Then, we design the algorithm based on DRL, including observation states, reward function, network structure as well as parameters optimization, in a 2D environment to circumvent the time-consuming works for a 3D environment. We transfer the designed algorithm to a simple 3D environment for retraining to obtain the converged network parameters, including the weights and biases of deep neural network (DNN), etc. Using these parameters as initial values, we continue to train the model in a complex 3D environment. To improve the generalization of the model in different scenes, we propose to combine the DRL algorithm Twin Delayed Deep Deterministic policy gradients (TD3) with the traditional global path planning algorithm Probabilistic Roadmap (PRM) as a novel path planner (PRM+TD3). Experimental results show that the incremental training mode can notably improve the development efficiency. Moreover, the PRM+TD3 path planner can effectively improve the generalization of the model.

中文翻译：

室内移动机器人路径规划的深度强化学习

本文提出了一种新颖的增量训练模式，以解决基于深度强化学习（DRL）的移动机器人路径规划问题。首先，我们在轻量级的2D环境中评估相关的图形搜索算法和强化学习（RL）算法。然后，我们在2D环境中设计了基于DRL的算法，包括观察状态，奖励函数，网络结构以及参数优化，从而避免了3D环境中的耗时工作。我们将设计的算法转移到简单的3D环境中进行再训练，以获得收敛的网络参数，包括深度神经网络（DNN）的权重和偏差等。使用这些参数作为初始值，我们继续在复杂的模型中训练模型3D环境。为了提高模型在不同场景下的通用性，我们建议将DRL算法双延迟深度确定性策略梯度（TD3）与传统的全局路径规划算法概率路线图（PRM）结合起来，作为一种新颖的路径规划器（PRM + TD3）。实验结果表明，增量训练模式可以显着提高开发效率。而且，PRM + TD3路径规划器可以有效地改善模型的通用性。

更新日期：2020-09-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>