当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scene text recognition via context modeling for low-quality image in logistics industry
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2022-11-30 , DOI: 10.1007/s40747-022-00916-1
Herui Heng, Peiji Li, Tuxin Guan, Tianyu Yang

Text recognition has been applied in many fields recently, such as robot vision, video retrieval, and scene understanding. However, minimal research has been conducted in the field of logistics wherein images of express sheets captured by cameras are mostly curved, distorted, and have low resolution. In this study, a new method is proposed to address the aforementioned research gap while simultaneously considering irregular and low-resolution English letters. The entire approach comprises a rectification module, a convolutional neural network (CNN) extractor, a semantic context module (SCM), a global context module (GCM), and a lightweight transformer decoder that can exhibit improved training speed. In particular, we propose the idea of context modeling in our proposed method. (1) The proposed SCM is introduced to capture full-image dependencies and generates rich semantic context information. (2) We propose the GCM, which not only enhances long-range dependencies from the output of SCM but also outputs abundant pixel information to the self-attention decoder. (3) To solve the low-resolution text recognition problem in a large number of express sheet scenes, we propose Chinese datasets for improving intelligent logistics. Experiments conducted on six public benchmarks demonstrate that the developed method achieves better robustness to low-resolution and irregular text images.



中文翻译:

基于上下文建模的物流行业低质量图像场景文本识别

文本识别最近在许多领域都有应用,例如机器人视觉、视频检索和场景理解。然而,在物流领域进行的研究很少,其中相机拍摄的快递单图像大多弯曲、失真且分辨率低。在这项研究中,提出了一种新方法来解决上述研究差距,同时考虑不规则和低分辨率的英文字母。整个方法包括一个整流模块、一个卷积神经网络 (CNN) 提取器、一个语义上下文模块 (SCM)、一个全局上下文模块 (GCM) 和一个可以提高训练速度的轻量级转换器解码器。特别是,我们在我们提出的方法中提出了上下文建模的想法。(1) 引入了所提出的 SCM 来捕获全图像依赖关系并生成丰富的语义上下文信息。(2) 我们提出了 GCM,它不仅增强了 SCM 输出的远程依赖性,而且还向自注意力解码器输出丰富的像素信息。(3) 针对大量快递单场景中的低分辨率文本识别问题,我们提出了用于改进智能物流的中文数据集。在六个公共基准上进行的实验表明,所开发的方法对低分辨率和不规则文本图像具有更好的鲁棒性。(3) 针对大量快递单场景中的低分辨率文本识别问题,我们提出了用于改进智能物流的中文数据集。在六个公共基准上进行的实验表明,所开发的方法对低分辨率和不规则文本图像具有更好的鲁棒性。(3) 针对大量快递单场景中的低分辨率文本识别问题,我们提出了用于改进智能物流的中文数据集。在六个公共基准上进行的实验表明,所开发的方法对低分辨率和不规则文本图像具有更好的鲁棒性。

更新日期:2022-11-30
down
wechat
bug