Residual Attention-Based Multi-Scale Script Identification in Scene Text Images,Neurocomputing

当前位置： X-MOL 学术 › Neurocomputing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Residual Attention-Based Multi-Scale Script Identification in Scene Text Images
Neurocomputing ( IF 5.5 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.neucom.2020.09.015
Mengkai Ma , Qiu-Feng Wang , Shan Huang , Shen Huang , Yannis Goulermas , Kaizhu Huang

Abstract Script identification is an essential step in the text extraction pipeline for multilingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve state-of-the-art performances, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.

中文翻译：

场景文本图像中基于剩余注意力的多尺度脚本识别

摘要脚本识别是多语言应用文本提取流程中必不可少的步骤。本文提出了一种有效的方法来识别场景文本图像中的脚本。由于背景复杂，文字风格多样，不同语言的字符相似，文字识别尚未解决。在脚本识别的一般分类框架下，我们研究了两个重要的组成部分：特征提取和分类层。在特征提取中，我们利用分层特征融合块来提取多尺度特征。此外，我们采用注意力机制来获得特征图的局部判别部分。在分类层，我们利用全卷积分类器生成通道级分类，然后由全局池化层处理以提高分类效率。我们在 RRC-MLT2017、SIW-13、CVSI-2015 和 MLe2e 的基准数据集上评估了所提出的方法，实验结果显示了每个精心设计的组件的有效性。最后，我们实现了最先进的性能，在 PRC-MLT2017、SIW-13、CVSI-2015 和 MLe2e 上的正确率分别为 89.66%、96.11%、98.78% 和 97.20%。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11