当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-11 , DOI: arxiv-2007.05655
Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.

中文翻译:

不断发展的图形规划器:视觉和语言导航的上下文全局规划

执行有效计划的能力对于构建指令遵循代理至关重要。在新环境中导航时,代理面临以下挑战:(1)将自然语言指令与其逐渐增长的世界知识联系起来;(2) 以有效探索和纠错的形式进行长期规划和决策。尽管做出了广泛的努力,目前的方法在这两个方面仍然受到限制。在本文中,我们介绍了 Evolving Graphical Planner (EGP),这是一种基于原始感官输入执行导航全局规划的模型。该模型动态构建图形表示,概括动作空间以允许更灵活的决策,并对代理图表示执行有效规划。我们在具有逼真图像的具有挑战性的视觉和语言导航 (VLN) 任务上评估我们的模型,并与以前的导航架构相比实现了卓越的性能。例如,我们通过纯模仿学习在房间到房间导航任务的测试拆分上实现了 53% 的成功率,比以前的导航架构高出 5%。
更新日期:2020-07-14
down
wechat
bug