当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-Dimensional Multi-Scale Perceptive Context for Scene Text Recognition
Neurocomputing ( IF 6 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.neucom.2020.06.071
Haojie Li , Daihui Yang , Shuangping Huang , Kin-Man Lam , Lianwen Jin , Zhenzhou Zhuang

Abstract Inspired by speech recognition, most of the recent state-of-the-art works convert scene text recognition into sequence prediction. Like most speech recognition problems, context modeling is considered as a critical component in these methods for achieving better performance. However, they usually only consider using a holistic or single-scale local sequence context, in a single dimension. Actually, scene texts or sequence contexts may span arbitrarily across a two-dimensional (2-D) space and in any style, not limited to only horizontal. Moreover, contexts of various scales may synthetically contribute to text recognition, in particular for irregular text recognition. In our method, we consider the context in a 2-D manner, and simultaneously consider context reasoning at various scales, from local to global. Based on this, we propose a new Two-Dimensional Multi-Scale Perceptive Context (TDMSPC) module, which performs multi-scale context learning, along both the horizontal and vertical directions, and then merges them. This can generate shape and layout-dependent feature maps for scene text recognition. This proposed module can be handily inserted into existing sequence-based frameworks to replace their context learning mechanism. Furthermore, a new scene text recognition network, called TDMSPC-Net, is built, by using the TDMSPC module as a building block for the encoder, and adopting an attention-based LSTM as the decoder. Experiments on benchmark datasets show that the TDMSPC module can substantially boost the performance of existing sequence-based scene text recognizers, irrespective of the decoder or backbone network being used. The proposed TDMSPC-Net achieves state-of-the-art accuracy on all the benchmark datasets.

中文翻译:

用于场景文本识别的二维多尺度感知上下文

摘要 受语音识别的启发,最近的大多数最先进的作品都将场景文本识别转换为序列预测。与大多数语音识别问题一样,上下文建模被认为是这些方法中实现更好性能的关键组件。然而,他们通常只考虑在单一维度中使用整体或单尺度局部序列上下文。实际上,场景文本或序列上下文可以任意跨越二维 (2-D) 空间并以任何风格跨越,不仅限于水平。此外,各种尺度的上下文可能对文本识别做出综合贡献,特别是对于不规则的文本识别。在我们的方法中,我们以二维方式考虑上下文,同时考虑从局部到全局的各种尺度的上下文推理。基于此,我们提出了一个新的二维多尺度感知上下文(TDMSPC)模块,它沿水平和垂直方向执行多尺度上下文学习,然后将它们合并。这可以为场景文本识别生成形状和布局相关的特征图。这个提议的模块可以很容易地插入到现有的基于序列的框架中,以取代它们的上下文学习机制。此外,通过使用 TDMSPC 模块作为编码器的构建块,并采用基于注意力的 LSTM 作为解码器,构建了一个新的场景文本识别网络,称为 TDMSPC-Net。在基准数据集上的实验表明,无论使用何种解码器或骨干网络,TDMSPC 模块都可以显着提高现有基于序列的场景文本识别器的性能。
更新日期:2020-11-01
down
wechat
bug