当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerated Sim-to-Real Deep Reinforcement Learning: Learning Collision Avoidance from Human Player
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-21 , DOI: arxiv-2102.10711 Hanlin Niu, Ze Ji, Farshad Arvin, Barry Lennox, Hujun Yin, Joaquin Carrasco
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-21 , DOI: arxiv-2102.10711 Hanlin Niu, Ze Ji, Farshad Arvin, Barry Lennox, Hujun Yin, Joaquin Carrasco
This paper presents a sensor-level mapless collision avoidance algorithm for
use in mobile robots that map raw sensor data to linear and angular velocities
and navigate in an unknown environment without a map. An efficient training
strategy is proposed to allow a robot to learn from both human experience data
and self-exploratory data. A game format simulation framework is designed to
allow the human player to tele-operate the mobile robot to a goal and human
action is also scored using the reward function. Both human player data and
self-playing data are sampled using prioritized experience replay algorithm.
The proposed algorithm and training strategy have been evaluated in two
different experimental configurations: \textit{Environment 1}, a simulated
cluttered environment, and \textit{Environment 2}, a simulated corridor
environment, to investigate the performance. It was demonstrated that the
proposed method achieved the same level of reward using only 16\% of the
training steps required by the standard Deep Deterministic Policy Gradient
(DDPG) method in Environment 1 and 20\% of that in Environment 2. In the
evaluation of 20 random missions, the proposed method achieved no collision in
less than 2~h and 2.5~h of training time in the two Gazebo environments
respectively. The method also generated smoother trajectories than DDPG. The
proposed method has also been implemented on a real robot in the real-world
environment for performance evaluation. We can confirm that the trained model
with the simulation software can be directly applied into the real-world
scenario without further fine-tuning, further demonstrating its higher
robustness than DDPG. The video and code are available:
https://youtu.be/BmwxevgsdGc
https://github.com/hanlinniu/turtlebot3_ddpg_collision_avoidance
中文翻译:
加速的从模拟到真实的深度强化学习:从人类玩家那里学习避免碰撞
本文提出了一种用于移动机器人的无传感器级别的无地图碰撞避免算法,该算法将原始传感器数据映射到线性和角速度,并在没有地图的未知环境中导航。提出了一种有效的训练策略,以允许机器人从人类经验数据和自我探索数据中学习。设计了一种游戏格式模拟框架,以允许人类玩家将移动机器人远程操作到一个目标,并且还使用奖励功能对人类行为进行评分。使用优先的经验重播算法对人类玩家数据和自我播放数据进行采样。已在两种不同的实验配置中对提出的算法和训练策略进行了评估:\ textit {Environment 1}(模拟的杂乱环境)和\ textit {Environment 2}(模拟的走廊环境),调查性能。结果表明,所提出的方法仅在环境1中使用标准深度确定性策略梯度(DDPG)方法所需的训练步骤的16 \%,在环境2中使用20 \%的培训步骤即可达到相同的奖励水平。在20个随机任务中,所提出的方法在两个凉亭环境中分别在不到2h和2.5h的训练时间内没有发生碰撞。该方法还产生了比DDPG更平滑的轨迹。所提出的方法还已经在真实环境中的真实机器人上实现,以进行性能评估。我们可以确认,使用仿真软件训练的模型可以直接应用于实际场景,而无需进行进一步的微调,从而进一步证明了其比DDPG更高的鲁棒性。
更新日期:2021-02-23
中文翻译:
加速的从模拟到真实的深度强化学习:从人类玩家那里学习避免碰撞
本文提出了一种用于移动机器人的无传感器级别的无地图碰撞避免算法,该算法将原始传感器数据映射到线性和角速度,并在没有地图的未知环境中导航。提出了一种有效的训练策略,以允许机器人从人类经验数据和自我探索数据中学习。设计了一种游戏格式模拟框架,以允许人类玩家将移动机器人远程操作到一个目标,并且还使用奖励功能对人类行为进行评分。使用优先的经验重播算法对人类玩家数据和自我播放数据进行采样。已在两种不同的实验配置中对提出的算法和训练策略进行了评估:\ textit {Environment 1}(模拟的杂乱环境)和\ textit {Environment 2}(模拟的走廊环境),调查性能。结果表明,所提出的方法仅在环境1中使用标准深度确定性策略梯度(DDPG)方法所需的训练步骤的16 \%,在环境2中使用20 \%的培训步骤即可达到相同的奖励水平。在20个随机任务中,所提出的方法在两个凉亭环境中分别在不到2h和2.5h的训练时间内没有发生碰撞。该方法还产生了比DDPG更平滑的轨迹。所提出的方法还已经在真实环境中的真实机器人上实现,以进行性能评估。我们可以确认,使用仿真软件训练的模型可以直接应用于实际场景,而无需进行进一步的微调,从而进一步证明了其比DDPG更高的鲁棒性。