当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2020-05-27 , DOI: 10.1016/j.patrec.2020.05.026
Nam Tuan Ly , Cuong Tuan Nguyen , Masaki Nakagawa

This paper presents an attention-based row-column encoder-decoder (ARCED) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor, a row-column encoder, and a decoder. We introduce a row-column BLSTM in the encoder and a residual LSTM network in the decoder. The whole system is trained end-to-end by a standard cross-entropy loss function, requiring only document images and their ground-truth text. We experimentally evaluate the performance of ARCED on the dataset of Japanese historical documents: Kana-PRMU. The results of the experiments show that ARCED outperforms the state-of-the-art recognition methods on the dataset. Furthermore, we demonstrate that the row-column BLSTM in the encoder and the residual LSTM in the decoder improves the performance of the encoder-decoder model for the recognition of Japanese historical document.



中文翻译:

基于注意力的行列编解码器模型在日本历史文献中的文本识别

本文提出了一种基于注意力的行-列编码器-解码器(ARCED)模型,该模型可从日语历史文档中识别多个文本行的输入图像,而无需对行进行显式分割。识别系统具有三个主要部分:特征提取器,行列编码器和解码器。我们在编码器中引入了行列BLSTM,在解码器中引入了残余LSTM网络。整个系统通过标准的交叉熵损失函数进行端到端培训,仅需要文档图像及其真实的文本。我们通过实验评估了ARCED在日本历史文献数据集:Kana-PRMU上的性能。实验结果表明,ARCED优于数据集上的最新识别方法。此外,

更新日期:2020-05-27
down
wechat
bug