Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Zhu, Xi; Mao, Zhendong; Liu, Chunxiao; Zhang, Peng; Wang, Bin; Zhang, Yongdong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2012.11528 (cs)

[Submitted on 17 Dec 2020]

Title:Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Authors:Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, Yongdong Zhang

View PDF

Abstract:Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency while strengthening image dependency. However, they are still subject to the language prior problem since the data biases have not been even alleviated. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and propose a self-supervised auxiliary task to utilize the balanced data to assist the base VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method can significantly outperform the state-of-the-art, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations.

Comments:	Accepted by IJCAI 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2012.11528 [cs.CV]
	(or arXiv:2012.11528v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2012.11528

Submission history

From: Xi Zhu [view email]
[v1] Thu, 17 Dec 2020 12:30:12 UTC (10,791 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators