HWNet v2: an efficient word image representation for handwritten documents,International Journal on Document Analysis and Recognition

当前位置： X-MOL 学术 › Int. J. Doc. Anal. Recognit. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HWNet v2: an efficient word image representation for handwritten documents
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2019-07-31 , DOI: 10.1007/s10032-019-00336-x
Praveen Krishnan , C. V. Jawahar

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

中文翻译：

HWNet v2：手写文档的高效文字图像表示

我们提出了一个框架，用于学习手写单词图像的有效整体表示。所提出的方法使用具有传统分类损失的深度卷积神经网络。我们工作的主要优势在于：（i）有效地使用合成数据来预先训练深度网络；（ii）具有感兴趣区域合并功能的ResNet-34体系结构的适应版本（称为HWNet v2））来学习可变大小的单词图像的判别特征，以及（iii）逼真的训练数据的逼真度增强，具有多种尺度和失真，模仿了笔迹的自然过程。我们将进一步研究转移学习的过程，以减少合成域与实际域之间的域差距，并使用文献中提出的可视化技术来分析在网络的不同层学习到的不变性。我们的表示法以最小的表示法大小实现了标准手写数据集和不同语言的历史手稿上的最新单词发现性能。面对挑战在iam数据集上，我们的方法首先报告的单词斑点的mAP约为0.90，表示大小仅为32维。此外，我们还在英语和印度语脚本的印刷文档数据集上展示了结果，这验证了所提出的用于学习单词图像表示的框架的一般性质。

更新日期：2019-07-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>