Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Gong, Haifan; Chen, Guanqi; Liu, Sishuo; Yu, Yizhou; Li, Guanbin

Computer Science > Multimedia

arXiv:2105.00136 (cs)

[Submitted on 1 May 2021]

Title:Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Authors:Haifan Gong, Guanqi Chen, Sishuo Liu, Yizhou Yu, Guanbin Li

View PDF

Abstract:Due to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pre-trained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention~(CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at this https URL.

Comments:	ICMR '21: ACM International Conference on Multimedia Retrieval, Taipei, Taiwan, August 21-24, 2021
Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2105.00136 [cs.MM]
	(or arXiv:2105.00136v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2105.00136

Submission history

From: Haifan Gong [view email]
[v1] Sat, 1 May 2021 00:49:26 UTC (271 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.MM

< prev | next >

new | recent | 2105

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yizhou Yu
Guanbin Li

export BibTeX citation

Computer Science > Multimedia

Title:Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators