当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MASTER: Multi-aspect non-local network for scene text recognition
Pattern Recognition ( IF 8 ) Pub Date : 2021-04-15 , DOI: 10.1016/j.patcog.2021.107980
Ning Lu , Wenwen Yu , Xianbiao Qi , Yihao Chen , Ping Gong , Rong Xiao , Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-driftproblem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. Pytorch code can be found at https://github.com/wenwenyu/MASTER-pytorch, and Tensorflow code can be found at https://github.com/jiangxiluning/MASTER-TF.



中文翻译:

MASTER:用于场景文本识别的多方面非本地网络

基于注意力的场景文本识别器获得了巨大的成功,它利用更紧凑的中间表示,通过基于RNN的编码器-解码器体系结构学习1d或2d注意。但是,这种方法会引起注意力的漂移问题是因为在基于RNN的本地注意力机制下,编码特征之间的高度相似性导致注意力混乱。此外,基于RNN的方法由于并行性差而效率低。为了克服这些问题,我们提出了MASTER,一种基于自我注意的场景文本识别器,它不仅可以(1)对输入输出注意进行编码,还可以学习对编码器内部的特征-特征和目标-目标之间的关系进行编码的自我注意,以及解码器和(2)学习到一种更强大,更鲁棒的针对空间失真的中间表示,并且(3)由于高效的训练并行化和高效的内存高速缓存机制而具有很高的推理能力,因此具有很高的效率。在各种基准上进行的大量实验证明了我们的MASTER在规则和不规则场景文本上均具有出色的性能。Pytorch代码可以在https://github.com/wenwenyu/MASTER-pytorch中找到,Tensorflow代码可以在https://github.com/jiangxiluning/MASTER-TF中找到。

更新日期:2021-04-19
down
wechat
bug