Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks
Pattern Recognition ( IF 8 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.patcog.2020.107482
Mohamed Yousef , Khaled F. Hussain , Usama S. Mohammed

Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propose a data-efficient, end-to-end neural network model for generic, unconstrained text recognition. In our proposed architecture we strive for simplicity and efficiency without sacrificing recognition accuracy. Our proposed architecture is a fully convolutional network without any recurrent connections trained with the CTC loss function. Thus it operates on arbitrary input sizes and produces strings of arbitrary length in a very efficient and parallelizable manner. We show the generality and superiority of our proposed text recognition architecture by achieving state of the art results on seven public benchmark datasets, covering a wide spectrum of text recognition tasks, namely: Handwriting Recognition, CAPTCHA recognition, OCR, License Plate Recognition, and Scene Text Recognition. Our proposed architecture has won the ICFHR2018 Competition on Automated Text Recognition on a READ Dataset.

中文翻译：

使用卷积神经网络进行准确、高效、无约束的文本识别

无约束文本识别是一项重要的计算机视觉任务，具有多种不同的子任务，每个子任务都有自己的挑战。深度神经网络的最大承诺之一是从输入原始信号中实现特征提取器的收敛和自动化，从而以最少的领域知识实现尽可能高的性能。为此，我们提出了一种数据高效、端到端的神经网络模型，用于通用、无约束的文本识别。在我们提出的架构中，我们在不牺牲识别准确性的情况下力求简单和高效。我们提出的架构是一个完全卷积网络，没有任何用 CTC 损失函数训练的循环连接。因此，它对任意输入大小进行操作，并以非常有效和可并行化的方式生成任意长度的字符串。我们通过在七个公共基准数据集上取得最先进的结果来展示我们提出的文本识别架构的通用性和优越性，涵盖了广泛的文本识别任务，即：手写识别、CAPTCHA 识别、OCR、车牌识别和场景文本识别。我们提出的架构在 READ 数据集上的自动文本识别竞赛中赢得了 ICFHR2018 竞赛。车牌识别和场景文本识别。我们提出的架构在 READ 数据集上的自动文本识别竞赛中赢得了 ICFHR2018 竞赛。车牌识别和场景文本识别。我们提出的架构在 READ 数据集上的自动文本识别竞赛中赢得了 ICFHR2018 竞赛。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>