当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Beyond visual semantics: Exploring the role of scene text in image understanding
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-07-01 , DOI: 10.1016/j.patrec.2021.06.011
Arka Ujjal Dey , Suman K. Ghosh , Ernest Valveny , Gaurav Harit

Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We not only extract and encode visual and scene text cues but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images with scene text content to demonstrate its effectiveness. In the retrieval framework, we augment the contextual semantic representation with scene text cues to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous scene text recognition, we also apply query-based attention to the text channel. We show that our multi-channel approach, involving contextual semantics and scene text, improves upon the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by 8.9% in the relevant statement retrieval task and by 5% in the topic classification task.



中文翻译:

超越视觉语义:探索场景文本在图像理解中的作用

具有视觉和场景文本内容的图像在日常生活中无处不在。然而,当前的图像解释系统大多仅限于仅使用视觉特征,而忽略了利用场景文本内容。在本文中,我们建议联合使用场景文本和视觉通道来对图像进行稳健的语义解释。我们不仅提取和编码视觉和场景文本线索,而且还对它们的相互作用进行建模以生成具有更丰富语义的上下文联合嵌入。将由此生成的上下文嵌入应用于具有场景文本内容的多媒体图像的检索和分类任务,以证明其有效性。在检索框架中,我们使用场景文本线索来增强上下文语义表示,以减轻语义嵌入期间可能发生的词汇缺失。为了处理不相关或错误的场景文本识别,我们还将基于查询的注意力应用于文本通道。我们表明,我们的多通道方法,涉及上下文语义和场景文本,在相关语句检索任务中将当前最先进的广告图像数据集方法的绝对准确度提高了 8.9%,在相关语句检索任务中提高了 5%。主题分类任务。

更新日期:2021-07-12
down
wechat
bug