当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards efficient unconstrained handwriting recognition using Dilated Temporal Convolution Network
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-09-12 , DOI: 10.1016/j.eswa.2020.114004
Annapurna Sharma , Dinesh Babu Jayagopi

Recognition of cursive handwritten images has advanced well with recent recurrent architectures and attention mechanism. Most of the works focus on improving transcription performance in terms of Character Error Rate (CER) and Word Error Rate (WER). Existing models are too slow to train and test networks. Furthermore, recent studies have recommended models be not only efficient in terms of task performance but also environmentally friendly in terms of model carbon footprint. Reviewing the recent state-of-the-art models, it recommends considering model training and retraining time while designing. High training time increases costs not only in terms of resources but also in carbon footprint. This becomes challenging for handwriting recognition model with popular recurrent architectures. It is truly critical since line images usually have a very long width resulting in a longer sequence to decode. In this work, we present a fully convolution based deep network architecture for cursive handwriting recognition from line level images. The architecture is a combination of 2-D convolutions and 1-D dilated non causal convolutions with Connectionist Temporal Classification (CTC) output layer. This offers a high parallelism with a smaller number of parameters. We further demonstrate experiments with various re-scaling factors of the images and how it affects the performance of the proposed model. A data augmentation pipeline is further analyzed while model training. The experiments show our model, has comparable performance on CER and WER measures with recurrent architectures. A comparison is done with state-of-the-art models with different architectures based on Recurrent Neural Networks (RNN) and its variants. The analysis shows training performance and network details of three different dataset of English and French handwriting. This shows our model has fewer parameters and takes less training and testing time, making it suitable for low-resource and environment-friendly deployment.



中文翻译:

借助时空卷积网络实现高效的无约束笔迹识别

草书手写图像的识别已随着最近的递归体系结构和注意力机制得到了很好的发展。大多数工作集中在提高字符错误率(CER)和字错误率(WER)方面的转录性能。现有模型太慢,无法训练和测试网络。此外,最近的研究建议模型不仅在任务执行方面有效,而且在模型碳足迹方面也对环境友好。回顾最近的最新模型,建议在设计时考虑模型训练和再训练时间。高培训时间不仅会增加资源成本,还会增加碳足迹。对于具有流行的循环体系结构的手写识别模型而言,这变得具有挑战性。这非常关键,因为线图像通常具有非常长的宽度,从而导致更长的解码序列。在这项工作中,我们提出了一种基于完全卷积的深度网络体系结构,用于从行级图像进行草书手写识别。该体系结构是2D卷积和1D扩展非因果卷积的组合,具有连接主义的时间分类(CTC)输出层。这提供了具有较少参数数量的高度并行性。我们进一步展示了具有各种图像缩放比例因子的实验,以及它如何影响所提出模型的性能。在进行模型训练时,将进一步分析数据增强管道。实验表明,我们的模型在具有递归架构的CER和WER措施上具有可比的性能。基于递归神经网络(RNN)及其变体,对具有不同体系结构的最新模型进行了比较。分析显示了英语和法语手写体三个不同数据集的训练效果和网络详细信息。这表明我们的模型具有更少的参数,并且花费了更少的培训和测试时间,使其适合于低资源和环境友好的部署。

更新日期:2020-09-12
down
wechat
bug