GoG: Relation-aware Graph-over-Graph Network for Visual Dialog,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08475
Feilong Chen, Xiuyi Chen, Fandong Meng, Peng Li, Jie Zhou

Visual dialog, which aims to hold a meaningful conversation with humans about a given image, is a challenging task that requires models to reason the complex dependencies among visual content, dialog history, and current questions. Graph neural networks are recently applied to model the implicit relations between objects in an image or dialog. However, they neglect the importance of 1) coreference relations among dialog history and dependency relations between words for the question representation; and 2) the representation of the image based on the fully represented question. Therefore, we propose a novel relation-aware graph-over-graph network (GoG) for visual dialog. Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation. As an additional feature representation module, we add GoG to the existing visual dialogue model. Experimental results show that our model outperforms the strong baseline in both generative and discriminative settings by a significant margin.

中文翻译：

GoG：用于视觉对话的关系感知图覆盖图网络

视觉对话旨在与人类就给定图像进行有意义的对话，这是一项具有挑战性的任务，需要模型来推理视觉内容、对话历史和当前问题之间的复杂依赖关系。图神经网络最近被应用于对图像或对话中对象之间的隐式关系进行建模。然而，他们忽略了1）对话历史之间的共指关系和问题表示中单词之间的依赖关系的重要性；和 2）基于完全表示问题的图像表示。因此，我们提出了一种用于视觉对话的新型关系感知图对图网络（GoG）。具体来说，GoG 由三个序列图组成：1）H-Graph，旨在捕获对话历史之间的共指关系；2）历史感知Q-Graph，旨在通过基于对话历史的共指解析捕获单词之间的依赖关系来充分理解问题；和 3) 问题感知 I-Graph，旨在基于完整的问题表示来捕获图像中对象之间的关系。作为额外的特征表示模块，我们将 GoG 添加到现有的视觉对话模型中。实验结果表明，我们的模型在生成和判别设置中均明显优于强基线。我们将 GoG 添加到现有的视觉对话模型中。实验结果表明，我们的模型在生成和判别设置中均明显优于强基线。我们将 GoG 添加到现有的视觉对话模型中。实验结果表明，我们的模型在生成和判别设置中均明显优于强基线。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>