SaHAN: Scale-aware hierarchical attention network for scene text recognition,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SaHAN: Scale-aware hierarchical attention network for scene text recognition
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-06-15 , DOI: 10.1016/j.patrec.2020.06.009
Jiaxin Zhang , Canjie Luo , Lianwen Jin , Tianwei Wang , Ziyan Li , Weiying Zhou

Scene text recognition has become a research hotspot owing to its abundant semantic information and various applications. Recent methods of scene text recognition usually focus on handling shape distortion, attention drift, or background noise, ignoring that text recognition encounters character scale-variation problem. To address this issue, in this paper, we propose a new scale-aware hierarchical attention network (SaHAN) for scene text recognition. Inspired by feature pyramid network, we exploit the inherent pyramidal structure of a deep convolutional network to retain multi-scale features for flexible receptive fields. Then, we construct a hierarchical attention decoder that performs the attention mechanism twice on multi-scale features to collect the most fine-grained information for prediction. The SaHAN is trained in a weak supervision way, requiring only images and corresponding text labels. Extensive experiments on seven benchmarks reveal that SaHAN achieves state-of-the-art performance.

中文翻译：

SaHAN：用于场景文本识别的可感知规模的分层注意力网络

场景文本识别由于其丰富的语义信息和多种应用而成为研究热点。场景文本识别的最新方法通常集中于处理形状失真，注意力漂移或背景噪声，而忽略了文本识别遇到的字符比例变化问题。为了解决这个问题，在本文中，我们提出了一种用于场景文本识别的新的可感知规模的分层注意力网络（SaHAN）。受特征金字塔网络的启发，我们利用深层卷积网络的固有金字塔结构来保留多尺度特征，以适应灵活的接收领域。然后，我们构造一个分层的注意解码器，对多尺度特征执行两次注意机制，以收集最细粒度的信息进行预测。SaHAN的培训方式很薄弱，仅需要图像和相应的文本标签。在七个基准测试上进行的广泛实验表明，SaHAN可以达到最先进的性能。

更新日期：2020-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11