A Hierarchical Deep Reinforcement Learning Framework With High Efficiency and Generalization for Fast and Safe Navigation,IEEE Transactions on Industrial Electronics

当前位置： X-MOL 学术 › IEEE Trans. Ind. Electron. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Hierarchical Deep Reinforcement Learning Framework With High Efficiency and Generalization for Fast and Safe Navigation
IEEE Transactions on Industrial Electronics ( IF 7.7 ) Pub Date : 2022-07-20 , DOI: 10.1109/tie.2022.3190850
Wei Zhu ₁ , Mitsuhiro Hayashibe ₁

Affiliation

We present a hierarchical deep reinforcement learning (DRL) framework with prominent sampling efficiency and sim-to-real transfer ability for fast and safe navigation: the low-level DRL policy enables the robot to move toward the target position and keep a safe distance to obstacles simultaneously; the high-level DRL policy is supplemented to further enhance the navigation safety. We select a waypoint located on the path from the robot to the ultimate goal as the subgoal to reduce the state space and avoid sparse reward. Moreover, the path is generated based on either a local or a global map, which can significantly improve the sampling efficiency, safety, and generalization ability of the proposed DRL framework. Additionally, a target-directed representation for the action space can be derived based on the subgoal to improve the motion efficiency and reduce the action space. In order to demonstrate the eminent sampling efficiency, motion performance, obstacle avoidance, and generalization ability of the proposed framework, we implement sufficient comparisons with the nonlearning navigation methods and DRL-based baselines, with videos, data, code, and other supplemental material shown on our website.

中文翻译：

一种高效、泛化的分层深度强化学习框架，用于快速安全导航

我们提出了一种分层深度强化学习 (DRL) 框架，具有突出的采样效率和模拟到真实的迁移能力，可实现快速安全的导航：低级 DRL 策略使机器人能够向目标位置移动并与目标保持安全距离同时障碍；补充高层DRL政策，进一步提升航行安全。我们选择位于从机器人到最终目标的路径上的路点作为子目标，以减少状态空间并避免稀疏奖励。此外，路径是基于局部或全局地图生成的，这可以显着提高所提出的 DRL 框架的采样效率、安全性和泛化能力。此外，可以基于子目标导出动作空间的目标导向表示，以提高运动效率并减少动作空间。为了证明所提出框架的卓越采样效率、运动性能、避障和泛化能力，我们与非学习导航方法和基于 DRL 的基线进行了充分比较，并显示了视频、数据、代码和其他补充材料在我们的网站上。

更新日期：2022-07-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>