A neural model for text localization, transcription and named entity recognition in full pages,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A neural model for text localization, transcription and named entity recognition in full pages
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-05-07 , DOI: 10.1016/j.patrec.2020.05.001
Manuel Carbonell , Alicia Fornés , Mauricio Villegas , Josep Lladós

In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks.

中文翻译：

用于整页文本定位，转录和命名实体识别的神经模型

近年来，用于文档图像中信息提取的深度神经网络体系结构的整合，大大改善了此过程涉及的每个任务的性能，包括文本本地化，转录和命名实体识别。但是，传统上，此过程是针对每个任务使用单独的方法执行的。在这项工作中，我们提出了一种端到端模型，该模型将一个阶段的对象检测网络与分支相结合，分别用于识别文本和命名实体，从而可以从每个任务的训练错误中同时学习共享特征。。通过这样做，该模型可以通过单个前馈步骤在页面级别上共同执行手写文本检测，转录和命名实体识别。我们对不同的数据集进行了详尽的评估，讨论了与顺序方法相比的优势和局限性。结果表明，该模型能够通过同时解决相互依赖的任务而受益于共享特征。

更新日期：2020-06-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11