Local Relation Network with Multilevel Attention for Visual Question Answering,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Local Relation Network with Multilevel Attention for Visual Question Answering
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-01-20 , DOI: 10.1016/j.jvcir.2020.102762
Bo Sun , Zeng Yao , Yinghui Zhang , Lejun Yu

With the tremendous success of the visual question answering (VQA) tasks, visual attention mechanisms have become an indispensable part of VQA models. However, these attention-based methods do not consider any relationship among regions, which is crucial for the thorough understanding of the image by the model. We propose local relation networks for generating context-aware image features for each image region, which contain information on the relationship among the other image regions. Furthermore, we propose a multilevel attention mechanism to combine semantic information from the LRNs and the original image regions,rendering the decision of the model more reasonable. With these two measures, we improve the region representation and achieve better attentive effect and VQA performance. We conduct numerous experiments on the COCO-QA dataset and the largest VQA v2.0 benchmark dataset. Our model achieves competitive results, proving the effectiveness of our proposed LRNs and multilevel attention mechanism through visual demonstrations.

中文翻译：

具有多层次注意力的本地关系网络，用于视觉问答

随着视觉问答系统（VQA）任务的巨大成功，视觉注意力机制已成为VQA模型不可或缺的一部分。但是，这些基于注意力的方法没有考虑区域之间的任何关系，这对于模型全面理解图像至关重要。我们建议使用局部关系网络为每个图像区域生成上下文感知的图像特征，其中包含有关其他图像区域之间关系的信息。此外，我们提出了一种多层次的注意力机制，将来自LRNs和原始图像区域的语义信息进行组合，使模型的决策更加合理。通过这两种措施，我们改善了区域表示，并获得了更好的注意力效果和VQA性能。我们在COCO-QA数据集和最大的VQA v2.0基准数据集上进行了大量实验。我们的模型取得了竞争性结果，通过视觉演示证明了我们提出的LRN和多层注意力机制的有效性。

更新日期：2020-01-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>