Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2020-09-17 , DOI: 10.1109/tnnls.2020.3017530
Liyang Zhang , Shuaicheng Liu , Donghao Liu , Pengpeng Zeng , Xiangpeng Li , Jingkuan Song , Lianli Gao

Visual question answering (VQA) that involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields, such as natural language processing and computer vision. Existing works highly rely on the knowledge of the data set. However, some questions require more professional cues other than the data set knowledge to answer questions correctly. To address such an issue, we propose a novel framework named a knowledge-based augmentation network (KAN) for VQA. We introduce object-related open-domain knowledge to assist the question answering. Concretely, we extract more visual information from images and introduce a knowledge graph to provide the necessary common sense or experience for the reasoning process. For these two augmented inputs, we design an attention module that can adjust itself according to the specific questions, such that the importance of external knowledge against detected objects can be balanced adaptively. Extensive experiments show that our KAN achieves state-of-the-art performance on three challenging VQA data sets, i.e., VQA v2, VQA-CP v2, and FVQA. In addition, our open-domain knowledge is also beneficial to VQA baselines. Code is available at https://github.com/yyyanglz/KAN .

中文翻译：

用于视觉问答的丰富的基于视觉知识的增强网络

随着自然语言处理和计算机视觉等相关研究领域深度学习的推动，涉及理解图像和配对问题的视觉问答 (VQA) 发展非常迅速。现有的工作高度依赖于数据集的知识。但是，有些问题需要比数据集知识更专业的线索才能正确回答问题。为了解决这个问题，我们为 VQA 提出了一个名为基于知识的增强网络（KAN）的新框架。我们引入了与对象相关的开放领域知识来辅助问答。具体来说，我们从图像中提取更多的视觉信息，并引入知识图谱，为推理过程提供必要的常识或经验。对于这两个增强输入，我们设计了一个注意力模块，可以根据具体问题进行自我调整，从而可以自适应地平衡外部知识对检测到的对象的重要性。大量实验表明，我们的 KAN 在三个具有挑战性的 VQA 数据集上实现了最先进的性能，即 VQA v2、VQA-CP v2 和 FVQA。此外，我们的开放领域知识也有利于 VQA 基线。代码可在https://github.com/yyyanglz/KAN .

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>