Visual question answering: a state-of-the-art review,Artificial Intelligence Review

当前位置： X-MOL 学术 › Artif. Intell. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual question answering: a state-of-the-art review
Artificial Intelligence Review ( IF 12.0 ) Pub Date : 2020-04-08 , DOI: 10.1007/s10462-020-09832-7
Sruthy Manmadhan , Binsu C. Kovoor

Visual question answering (VQA) is a task that has received immense consideration from two major research communities: computer vision and natural language processing. Recently it has been widely accepted as an AI-complete task which can be used as an alternative to visual turing test. In its most common form, it is a multi-modal challenging task where a computer is required to provide the correct answer for a natural language question asked about an input image. It attracts many deep learning researchers after their remarkable achievements in text, voice and vision technologies. This review extensively and critically examines the current status of VQA research in terms of step by step solution methodologies, datasets and evaluation metrics. Finally, this paper also discusses future research directions for all the above-mentioned aspects of VQA separately.

中文翻译：

视觉问答：最先进的评论

视觉问答 (VQA) 是一项受到两个主要研究社区广泛考虑的任务：计算机视觉和自然语言处理。最近，它已被广泛接受为 AI 完整的任务，可用作视觉图灵测试的替代方案。在最常见的形式中，它是一项多模式挑战性任务，其中要求计算机为有关输入图像的自然语言问题提供正确答案。它在文本、语音和视觉技术方面取得了非凡的成就，吸引了众多深度学习研究人员。这篇综述从逐步解决方案方法、数据集和评估指标方面广泛而批判性地审查了 VQA 研究的当前状态。最后，

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>