Reinforcement Learning-based Visual Navigation with Information-Theoretic Regularization,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning-based Visual Navigation with Information-Theoretic Regularization
arXiv - CS - Robotics Pub Date : 2019-12-09 , DOI: arxiv-1912.04078
Qiaoyun Wu, Kai Xu, Jun Wang, Mingliang Xu, Xiaoxi Gong, Dinesh Manocha

To enhance the cross-target and cross-scene generalization of target-driven visual navigation based on deep reinforcement learning (RL), we introduce an information-theoretic regularization term into the RL objective. The regularization maximizes the mutual information between navigation actions and visual observation transforms of an agent, thus promoting more informed navigation decisions. This way, the agent models the action-observation dynamics by learning a variational generative model. Based on the model, the agent generates (imagines) the next observation from its current observation and navigation target. This way, the agent learns to understand the causality between navigation actions and the changes in its observations, which allows the agent to predict the next action for navigation by comparing the current and the imagined next observations. Cross-target and cross-scene evaluations on the AI2-THOR framework show that our method attains at least a $10\%$ improvement of average success rate over some state-of-the-art models. We further evaluate our model in two real-world settings: navigation in unseen indoor scenes from a discrete Active Vision Dataset (AVD) and continuous real-world environments with a TurtleBot.We demonstrate that our navigation model is able to successfully achieve navigation tasks in these scenarios. Videos and models can be found in the supplementary material.

中文翻译：

具有信息论正则化的基于强化学习的视觉导航

为了增强基于深度强化学习 (RL) 的目标驱动视觉导航的跨目标和跨场景泛化，我们在 RL 目标中引入了信息论正则化项。正则化最大化了导航动作和代理的视觉观察变换之间的互信息，从而促进了更明智的导航决策。这样，代理通过学习变分生成模型来模拟动作观察动态。基于模型，代理从其当前观察和导航目标生成（想象）下一个观察。通过这种方式，智能体学会了理解导航动作与其观察变化之间的因果关系，这允许代理通过比较当前和想象的下一个观察来预测下一个导航动作。AI2-THOR 框架上的跨目标和跨场景评估表明，我们的方法与一些最先进的模型相比，平均成功率至少提高了 10%%。我们在两个真实世界的设置中进一步评估我们的模型：来自离散主动视觉数据集 (AVD) 的看不见的室内场景中的导航和使用 TurtleBot 的连续真实世界环境。我们证明我们的导航模型能够成功地实现导航任务这些场景。视频和模型可以在补充材料中找到。AI2-THOR 框架上的跨目标和跨场景评估表明，我们的方法与一些最先进的模型相比，平均成功率至少提高了 10%%。我们在两个真实世界的设置中进一步评估我们的模型：来自离散主动视觉数据集 (AVD) 的看不见的室内场景中的导航和使用 TurtleBot 的连续真实世界环境。我们证明我们的导航模型能够成功地实现导航任务这些场景。视频和模型可以在补充材料中找到。AI2-THOR 框架上的跨目标和跨场景评估表明，我们的方法与一些最先进的模型相比，平均成功率至少提高了 10%%。我们在两个真实世界的设置中进一步评估我们的模型：来自离散主动视觉数据集 (AVD) 的看不见的室内场景中的导航和使用 TurtleBot 的连续真实世界环境。我们证明我们的导航模型能够成功地实现导航任务这些场景。视频和模型可以在补充材料中找到。我们证明了我们的导航模型能够在这些场景中成功实现导航任务。视频和模型可以在补充材料中找到。我们证明了我们的导航模型能够在这些场景中成功实现导航任务。视频和模型可以在补充材料中找到。

更新日期：2020-11-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>