当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.11118
Tuong Do, Binh X. Nguyen, Huy Tran, Erman Tjiputra, Quang D. Tran, Thanh-Toan Do

Different approaches have been proposed to Visual Question Answering (VQA). However, few works are aware of the behaviors of varying joint modality methods over question type prior knowledge extracted from data in constraining answer search space, of which information gives a reliable cue to reason about answers for questions asked in input images. In this paper, we propose a novel VQA model that utilizes the question-type prior information to improve VQA by leveraging the multiple interactions between different joint modality methods based on their behaviors in answering questions from different types. The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.

中文翻译:

具有问题类型先验知识的多重交互学习用于约束视觉问答中的答案搜索空间

已经针对视觉问答 (VQA) 提出了不同的方法。然而,很少有作品意识到在约束答案搜索空间中从数据中提取的问题类型先验知识的不同联合模态方法的行为,这些信息为推理输入图像中提出的问题的答案提供了可靠的线索。在本文中,我们提出了一种新的 VQA 模型,该模型利用问题类型的先验信息,通过基于不同联合模态方法在回答不同类型问题时的行为之间的多重交互来改进 VQA。在两个基准数据集,即 VQA 2.0 和 TDIUC 上的可靠实验表明,所提出的方法以最具竞争力的方法产生最佳性能。
更新日期:2020-09-24
down
wechat
bug