当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement
International Journal on Document Analysis and Recognition ( IF 1.8 ) Pub Date : 2019-08-08 , DOI: 10.1007/s10032-019-00341-0
Anna Zhu , Chen Zhang , Zhi Li , Shengwu Xiong

Document localization is a promising step for document-based optical character recognition. This task gains difficulty when documents are located in complex natural scene images. In this paper, we propose a coarse-to-fine document localization approach to detect the four corner points of the document in natural scene images. In the first stage, the four corners are roughly predicted through a deep neural networks-based Joint Corner Detector (JCD) with an attention mechanism, which roughly localize the document region via an attentional map. As a key to produce accurate inference of corners, the JCD module suppresses the interference from background in convolutional features substantially. In the second stage, a corner-specific refiner module is designed to refine the previously predicted corners. Considering the different characteristics of the four document corners, the patches cropped around the predicted corners are input into four different corner-specified CNN models, to search the accurate corner locations recursively. Three datasets (ICDAR 2015 SmartDoc competition 1 dataset, SEECS-NUSF dataset and a self-collected dataset) are used to evaluate the performance of our method. The experimental results demonstrate the superiority of the proposed method in localizing the document in natural images, especially in those with complex background. Compared with the state-of-the-art works, our method outperforms most of them.

中文翻译:

自然场景图像中从粗到细的文档本地化,具有区域注意力和递归角点细化

文档本地化是基于文档的光学字符识别的有希望的步骤。当文档位于复杂的自然场景图像中时,此任务会变得困难。在本文中,我们提出了一种从粗到细的文档定位方法,以检测自然场景图像中文档的四个角点。在第一阶段,通过具有注意力机制的基于深度神经网络的联合角检测器(JCD)粗略地预测了四个角,该机制通过注意力图大致定位了文档区域。作为产生准确的角点推断的关键,JCD模块可以从根本上抑制卷积特征中背景的干扰。在第二阶段中,特定于拐角的细化器模块被设计为细化先前预测的拐角。考虑到四个文档角的不同特征,将围绕预测角裁剪的补丁输入到四个不同的角指定的CNN模型中,以递归搜索准确的角位置。使用三个数据集(ICDAR 2015 SmartDoc竞赛1数据集,SEECS-NUSF数据集和自收集数据集)来评估我们方法的性能。实验结果证明了该方法在自然图像中定位文档的优越性,尤其是在背景复杂的图像中。与最先进的作品相比,我们的方法优于大多数作品。使用三个数据集(ICDAR 2015 SmartDoc竞赛1数据集,SEECS-NUSF数据集和自收集数据集)来评估我们方法的性能。实验结果证明了该方法在自然图像中定位文档的优越性,尤其是在背景复杂的图像中。与最先进的作品相比,我们的方法优于大多数作品。使用三个数据集(ICDAR 2015 SmartDoc竞赛1数据集,SEECS-NUSF数据集和自收集数据集)来评估我们方法的性能。实验结果证明了该方法在自然图像中定位文档的优越性,尤其是在背景复杂的图像中。与最先进的作品相比,我们的方法优于大多数作品。
更新日期:2019-08-08
down
wechat
bug