Building efficient CNN architecture for offline handwritten Chinese character recognition,International Journal on Document Analysis and Recognition

当前位置： X-MOL 学术 › Int. J. Doc. Anal. Recognit. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Building efficient CNN architecture for offline handwritten Chinese character recognition
International Journal on Document Analysis and Recognition ( IF 1.8 ) Pub Date : 2018-08-29 , DOI: 10.1007/s10032-018-0311-4
Zhiyuan Li , Nanjun Teng , Min Jin , Huaxiang Lu

Deep convolutional neural networks-based methods have brought great breakthrough in image classification, which provides an end-to-end solution for handwritten Chinese character recognition (HCCR) problem through learning discriminative features automatically. Nevertheless, state-of-the-art CNNs appear to incur huge computational cost and require the storage of a large number of parameters especially in fully connected layers, which is difficult to deploy such networks into alternative hardware devices with limited computation capacity. To solve the storage problem, we propose a novel technique called weighted average pooling for reducing the parameters in fully connected layer without loss in accuracy. Besides, we implement a cascaded model in single CNN by adding mid output to complete recognition as early as possible, which reduces average inference time significantly. Experiments are performed on the ICDAR-2013 offline HCCR dataset. It is found that our proposed approach only needs 6.9 ms for classifying a character image on average and achieves the state-of-the-art accuracy of 97.1% while requires only 3.3 MB for storage.

中文翻译：

构建用于脱机手写汉字识别的高效CNN架构

基于深度卷积神经网络的方法为图像分类带来了巨大突破，它通过自动学习判别特征为手写汉字识别（HCCR）问题提供了端到端解决方案。但是，最新的CNN似乎会产生巨大的计算成本，并且需要存储大量参数，尤其是在完全连接的层中，这很难将此类网络部署到计算能力有限的替代硬件设备中。为了解决存储问题，我们提出了一种称为加权平均池的新技术，用于减少完全连接层中的参数而不会降低精度。此外，我们通过添加中间输出以尽早完成识别，在单个CNN中实现了级联模型，这大大减少了平均推理时间。实验在ICDAR-2013离线HCCR数据集上进行。发现我们提出的方法平均只需要6.9 ms就能对字符图像进行分类，并且可以达到97.1％的最新精度，而存储空间仅需要3.3 MB。

更新日期：2018-08-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11