当前位置: X-MOL 学术Front. Neurorobotics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.
Frontiers in Neurorobotics ( IF 2.6 ) Pub Date : 2020-06-25 , DOI: 10.3389/fnbot.2020.00043
Jinpeng Mi 1, 2 , Jianzhi Lyu 2 , Song Tang 1, 2 , Qingdu Li 1 , Jianwei Zhang 2
Affiliation  

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.

中文翻译:


通过引用表达理解和场景图解析进行交互式自然语言基础。



自然语言为人类和机器人之间提供了直观有效的交互界面。目前,人们提出了多种方法来解决人机交互的自然语言视觉基础问题。然而,大多数现有方法处理自然语言查询的歧义并通过对话系统实现目标对象接地,这使得交互变得繁琐且耗时。相比之下,我们在没有辅助信息的情况下解决交互式自然语言基础。具体来说,我们首先提出一个指代表达理解网络来基础自然指代表达。引用表达理解网络通过视觉语义感知网络挖掘视觉语义,并通过语言注意网络利用表达中丰富的语言上下文。此外,我们将指称表达理解网络与场景图解析相结合,以实现不受限制和复杂的自然语言基础。最后,我们在三个公共数据集上验证了引用表达理解网络的性能,并且通过在不同的家庭场景中进行广泛的自然语言查询基础来评估交互式自然语言基础架构的有效性。
更新日期:2020-06-25
down
wechat
bug