Transformer-Based Approach for Joint Handwriting and Named Entity Recognition in Historical documents,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Transformer-Based Approach for Joint Handwriting and Named Entity Recognition in Historical documents
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-11-09 , DOI: 10.1016/j.patrec.2021.11.010
Ahmed Cheikh Rouhou ₁ , Marwa Dhiaf _{1,

2,

3} , Yousri Kessentini _{2,

3} , Sinda Ben Salem ₁

Affiliation

The extraction of relevant information carried out by named entities in handwriting documents is still a challenging task. Unlike traditional information extraction approaches that usually face text transcription and named entity recognition as separate subsequent tasks, we propose in this paper an end-to-end transformer-based approach to jointly perform these two tasks. The proposed approach operates at the paragraph level, which brings two main benefits. First, it allows the model to avoid unrecoverable early errors due to line segmentation. Second, it allows the model to exploit larger bi-dimensional context information to identify the semantic categories, reaching a higher final prediction accuracy. We also explore different training scenarios to show their effect on the performance and we demonstrate that a two-stage learning strategy can make the model reach a higher final prediction accuracy. As far as we know, this work presents the first approach that adopts the transformer networks for named entity recognition in handwritten documents. We achieve the new state-of-the-art performance in the ICDAR 2017 Information Extraction competition using the Esposalles database, for the complete task, even though the proposed technique does not use any dictionaries, language modeling, or post-processing.

中文翻译：

基于 Transformer 的历史文档中联合手写和命名实体识别方法

手写文档中命名实体的相关信息提取仍然是一项具有挑战性的任务。与通常将文本转录和命名实体识别作为单独的后续任务面对的传统信息提取方法不同，我们在本文中提出了一种基于端到端转换器的方法来共同执行这两项任务。提议的方法在段落级别运行，这带来了两个主要好处。首先，它允许模型避免由于线分割而导致的不可恢复的早期错误。其次，它允许模型利用更大的二维上下文信息来识别语义类别，从而达到更高的最终预测精度。我们还探索了不同的训练场景以显示它们对性能的影响，并且我们证明了两阶段学习策略可以使模型达到更高的最终预测精度。据我们所知，这项工作提出了第一种采用 Transformer 网络在手写文档中进行命名实体识别的方法。我们使用 Esposalles 数据库在 ICDAR 2017 信息提取竞赛中实现了新的最先进性能，即使所提出的技术不使用任何词典、语言建模或后处理，也可以完成整个任务。

更新日期：2021-11-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11