DenseNet-CTC: An end-to-end RNN-free architecture for context-free string recognition,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DenseNet-CTC: An end-to-end RNN-free architecture for context-free string recognition
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.cviu.2021.103168
Hongjian Zhan , Shujing Lyu , Yue Lu , Umapada Pal

String recognition is one of the challenging tasks in document analysis and recognition areas. Recently, with the surge of interest in end-to-end segmentation-free methods, CRNN (Convolution Recurrent Neural Network), which is a combination of CNN (Convolutional Neural Network) and RNN-CTC (Recurrent Neural Network-Connectionist Temporal Classification), has been widely applied to string recognition. However, in some context-free cases, where a character is followed by arbitrary characters like the digit string, there may be no or very few context links in these strings. In this paper, we propose a new end-to-end RNN-free architecture especially for context-free string recognition and apply it to Handwritten Digit String Recognition (HDSR) task. The proposed architecture is based on CNN and CTC, but without the usage of RNN, and we apply column-wise fully connected layers to connect the convolutional layers and CTC directly. Moreover, to compensate for the possible reduction in modeling capabilities caused by the absence of RNN, we apply densely connected convolutional layers to extract efficient features. We test this new architecture on three public HDSR benchmarks (ORAND-CAR-A, ORAND-CAR-B and CVL HDS) and three other datasets that include a handwritten telephone/postcode dataset PhPAIS and two non-Arabic digit datasets (C-Bangla and C-Hindi). Furthermore, we generate three handwritten digit string datasets to further analyze the influence of RNN. The recognition results on all datasets demonstrate the superiority of the proposed model.

中文翻译：

DenseNet-CTC：端到端的无RNN架构，用于无上下文的字符串识别

字符串识别是文档分析和识别领域中具有挑战性的任务之一。近年来，随着人们对端到端无分段方法的兴趣激增，卷积神经网络（CNN）与卷积神经网络（CNN）和递归神经网络（时空分类法）联系在一起。，已广泛应用于字符串识别。但是，在某些无上下文的情况下，字符后跟任意字符，例如数字字符串，这些字符串中可能没有或只有很少的上下文链接。在本文中，我们提出了一种新的端到端无RNN架构，特别是用于上下文无关的字符串识别，并将其应用于手写数字字符串识别（HDSR）任务。拟议的架构基于CNN和CTC，但不使用RNN，然后我们应用逐列完全连接的层来直接连接卷积层和CTC。此外，为了补偿由于缺少RNN而可能导致建模能力降低的情况，我们应用了紧密连接的卷积层来提取有效特征。我们在三个公共HDSR基准（ORAND-CAR-A，ORAND-CAR-B和CVL HDS）以及其他三个数据集（包括手写电话/邮政编码数据集PhPAIS和两个非阿拉伯数字数据集（C-Bangla））上测试了这种新架构和C-Hindi）。此外，我们生成了三个手写数字字符串数据集，以进一步分析RNN的影响。在所有数据集上的识别结果证明了所提出模型的优越性。为了弥补由于缺少RNN而导致建模能力降低的可能，我们应用了紧密连接的卷积层来提取有效特征。我们在三个公共HDSR基准（ORAND-CAR-A，ORAND-CAR-B和CVL HDS）以及其他三个数据集（包括手写电话/邮政编码数据集PhPAIS和两个非阿拉伯数字数据集（C-Bangla））上测试了这种新架构和C-Hindi）。此外，我们生成了三个手写数字字符串数据集，以进一步分析RNN的影响。在所有数据集上的识别结果证明了所提出模型的优越性。为了弥补由于缺少RNN而导致建模能力降低的可能，我们应用了紧密连接的卷积层来提取有效特征。我们在三个公共HDSR基准（ORAND-CAR-A，ORAND-CAR-B和CVL HDS）以及其他三个数据集（包括手写电话/邮政编码数据集PhPAIS和两个非阿拉伯数字数据集（C-Bangla））上测试了这种新架构和C-Hindi）。此外，我们生成了三个手写数字字符串数据集，以进一步分析RNN的影响。在所有数据集上的识别结果证明了所提出模型的优越性。ORAND-CAR-B和CVL HDS）以及其他三个数据集，其中包括手写电话/邮政编码数据集PhPAIS和两个非阿拉伯数字数据集（C-Bangla和C-Hindi）。此外，我们生成了三个手写数字字符串数据集，以进一步分析RNN的影响。在所有数据集上的识别结果证明了所提出模型的优越性。ORAND-CAR-B和CVL HDS）以及其他三个数据集，其中包括手写电话/邮政编码数据集PhPAIS和两个非阿拉伯数字数据集（C-Bangla和C-Hindi）。此外，我们生成了三个手写数字字符串数据集，以进一步分析RNN的影响。在所有数据集上的识别结果证明了所提出模型的优越性。

更新日期：2021-01-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11