Computational construction grammar for visual question answering,Linguistics Vanguard

当前位置： X-MOL 学术 › Linguistics Vanguard › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Computational construction grammar for visual question answering
Linguistics Vanguard ( IF 1.1 ) Pub Date : 2019-12-11 , DOI: 10.1515/lingvan-2018-0070
Jens Nevens ₁ , Paul Van Eecke ₁ , Katrien Beuls ₁

Affiliation

Abstract In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.

中文翻译：

视觉问答的计算构造语法

摘要为了能够回答自然语言的问题，计算系统需要三个主要功能。首先，系统需要能够将问题分析为结构化查询，以揭示其组成部分以及如何将它们组合在一起。其次，它需要访问相关的知识资源，例如数据库，文本或图像。第三，它需要能够对这些知识源执行查询。本文着重介绍第一种功能，提出了一种语义分析自然语言表达的问题的新颖方法。该方法利用计算构造语法模型将问题映射到其可执行的语义表示上。我们演示并评估有关CLEVR视觉问答基准任务的方法。我们的系统达到100％的准确性，有效地解决基准测试任务中的语言理解部分。此外，我们演示了如何将该解决方案嵌入完整的视觉问题回答系统中，在该系统中，问题可以通过在图像上执行其语义表示来回答。该方法的主要优点包括（i）其透明和可解释的特性，（ii）其可扩展性，以及（iii）该方法不依赖于任何带注释的训练数据的事实。

更新日期：2019-12-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文