An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2020-05-27 , DOI: 10.1016/j.patrec.2020.05.026
Nam Tuan Ly , Cuong Tuan Nguyen , Masaki Nakagawa

This paper presents an attention-based row-column encoder-decoder (ARCED) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor, a row-column encoder, and a decoder. We introduce a row-column BLSTM in the encoder and a residual LSTM network in the decoder. The whole system is trained end-to-end by a standard cross-entropy loss function, requiring only document images and their ground-truth text. We experimentally evaluate the performance of ARCED on the dataset of Japanese historical documents: Kana-PRMU. The results of the experiments show that ARCED outperforms the state-of-the-art recognition methods on the dataset. Furthermore, we demonstrate that the row-column BLSTM in the encoder and the residual LSTM in the decoder improves the performance of the encoder-decoder model for the recognition of Japanese historical document.

中文翻译：

基于注意力的行列编解码器模型在日本历史文献中的文本识别

本文提出了一种基于注意力的行-列编码器-解码器（ARCED）模型，该模型可从日语历史文档中识别多个文本行的输入图像，而无需对行进行显式分割。识别系统具有三个主要部分：特征提取器，行列编码器和解码器。我们在编码器中引入了行列BLSTM，在解码器中引入了残余LSTM网络。整个系统通过标准的交叉熵损失函数进行端到端培训，仅需要文档图像及其真实的文本。我们通过实验评估了ARCED在日本历史文献数据集：Kana-PRMU上的性能。实验结果表明，ARCED优于数据集上的最新识别方法。此外，

更新日期：2020-05-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>