当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STAN: A Sequential Transformation Attention-based Network for Scene Text Recognition
Pattern Recognition ( IF 8 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.patcog.2020.107692
Qingxiang Lin , Canjie Luo , Lianwen Jin , Songxuan Lai

Abstract Scene text with an irregular layout is difficult to recognize. To this end, a Sequential Transformation Attention-based Network (STAN), which comprises a sequential transformation network and an attention-based recognition network, is proposed for general scene text recognition. The sequential transformation network rectifies irregular text by decomposing the task into a series of patch-wise basic transformations, followed by a grid projection submodule to smooth the junction between neighboring patches. The entire rectification process is able to be trained in an end-to-end weakly supervised manner, requiring only images and their corresponding groundtruth text. Based on the rectified images, an attention-based recognition network is employed to predict a character sequence. Experiments on several benchmarks demonstrate the state-of-the-art performance of STAN on both regular and irregular text.

中文翻译:

STAN:用于场景文本识别的基于顺序转换注意的网络

具有不规则布局的抽象场景文本难以识别。为此,提出了一种基于序列转换注意力的网络(STAN),它包括一个序列转换网络和一个基于注意力的识别网络,用于一般场景文本识别。顺序变换网络通过将任务分解为一系列逐块基本变换来纠正不规则文本,然后是网格投影子模块以平滑相邻块之间的连接。整个整改过程能够以端到端的弱监督方式进行训练,只需要图像及其相应的真实文本。基于校正后的图像,采用基于注意力的识别网络来预测字符序列。
更新日期:2021-03-01
down
wechat
bug