当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visual question answering model based on graph neural network and contextual attention
Image and Vision Computing ( IF 4.7 ) Pub Date : 2021-03-29 , DOI: 10.1016/j.imavis.2021.104165
Himanshu Sharma , Anand Singh Jalal

Visual Question Answering (VQA) has recently appeared as a hot research area in the field of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0.



中文翻译:

基于图神经网络和上下文关注的视觉问答模型

视觉问答(VQA)最近已成为计算机视觉和自然语言处理领域的热门研究领域。VQA模型同时使用图像和问题特征,并融合它们以预测与图像相关的给定自然问题的答案。但是,大多数使用注意力机制的VQA方法主要集中于从感兴趣区域提取视觉信息以进行答案预测,而忽略了感兴趣区域之间的关系以及这些区域之间的推理。除此限制外,VQA方法还忽略了以前有人参与生成答案的区域。过去参与的这些区域可以指导后续关注区域的选择。在本文中,提出并制定了一种新颖的VQA模型,该模型利用了区域之间的这种关系,并采用了基于视觉上下文的注意力,并考虑了先前参加的视觉内容。实验结果表明,提出的VQA模型提高了公开可用数据集VQA 1.0和VQA 2.0的答案预测准确性。

更新日期:2021-04-06
down
wechat
bug