当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Control of Situated Agents through Natural Language
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08214
Shuyan Zhou, Pengcheng Yin, Graham Neubig

When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks. However, in the literature on natural language (NL) command of situated agents, most works have treated the procedures to be executed as flat sequences of simple actions, or any hierarchies of procedures have been shallow at best. In this paper, we propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We further propose a modeling paradigm of hierarchical modular networks, which consist of a planner and reactors that convert NL intents to predictions of executable programs and probe the environment for information necessary to complete the program execution. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following. Our model outperforms reactive baselines by a large margin on both datasets. We also demonstrate that our framework is more data-efficient, and that it allows for fast iterative development.



当人类构思如何执行特定任务时,他们会按层次进行:将更高级别的任务拆分为更小的子任务。然而,在有关定位代理的自然语言 (NL) 命令的文献中,大多数作品都将要执行的程序视为简单动作的平面序列,或者任何程序的层次结构充其量只是浅层。在本文中,我们提出了一种程序形式主义,这是一种强大而直观的方法,用于表示代理命令和控制的分层程序知识。我们进一步提出了分层模块化网络的建模范式,它由规划器和反应器组成,将 NL 意图转换为对可执行程序的预测,并探测环境以获取完成程序执行所需的信息。我们在 IQA 和 ALFRED 数据集上为 NL 指令实例化了这个框架。我们的模型在两个数据集上都大大优于反应基线。我们还证明了我们的框架数据效率更高,并且允许快速迭代开发。