当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visual Question Answering: which investigated applications?
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02937
Silvio Barra, Carmen Bisogni, Maria De Marsico, Stefano Ricciardi

Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are used to assess the building blocks of a VQA system. This paper rather considers the proposals that focus on real-world applications, possibly using as benchmarks suitable data bound to the application domain. The paper also reports about some recent challenges in VQA research.

中文翻译:

视觉问答:哪些调查过的应用程序?

视觉问答(VQA)是一个非常刺激和具有挑战性的研究领域,计算机视觉(CV)和自然语言过程(NLP)最近相遇了。在图像字幕和视频摘要中,语义信息完全包含在静态图像或视频动态信息中,并且只需以与人类一致的方式进行挖掘和表达即可。与此不同的是,在VQA中,必须将同一媒体中的语义信息与以自然语言表达的问题所隐含的语义进行比较,从而使与人工智能相关的工作加倍。最近有关VQA方法的一些调查集中在与图像相关的处理或与语言相关的处理基础的方法上,或集中于始终融合所传达信息的方法上。仅建议可能的应用,实际上,大多数被引用的著作都依赖于用于评估VQA系统构件的通用数据集。本文宁愿考虑针对实际应用程序的建议,也可能将绑定到应用程序域的合适数据用作基准。本文还报告了VQA研究中的一些近期挑战。
更新日期:2021-03-05
down
wechat
bug