当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A two-stage method for text line detection in historical documents
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2019-07-23 , DOI: 10.1007/s10032-019-00332-1
Tobias Grüning , Gundram Leifert , Tobias Strauß , Johannes Michael , Roger Labahn

This work presents a two-stage text line detection method for historical documents. Each detected text line is represented by its baseline. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator and other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images (\(<\,50\)). This is achieved by utilizing data augmentation strategies. The network predictions are used as input for the second stage which performs a bottom-up clustering to build baselines. The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines. It substantially outperforms current state-of-the-art approaches. For example, for the complex track of the cBAD: ICDAR2017 Competition on Baseline Detection the F value is increased from 0.859 to 0.922. The framework to train and run the ARU-Net is open source.

中文翻译:

历史文档中文本行检测的两阶段方法

这项工作提出了历史文档的两阶段文本行检测方法。每个检测到的文本行均由其基线表示。在第一阶段,称为ARU-Net的深度神经网络将像素标记为属于以下三个类别之一:基线,分隔符和其他。分隔符类标记每个文本行的开头和结尾。ARU-Net可通过少量可手动注释的示例图像(\(<\,50 \)进行从头训练。)。这是通过利用数据增强策略来实现的。网络预测用作第二阶段的输入,第二阶段执行自底向上的聚类以建立基线。所开发的方法能够处理复杂的布局以及弯曲的和任意定向的文本行。它明显优于当前的最新方法。例如,对于cBAD的复杂轨迹:ICDAR2017基线检测竞争,F值从0.859增加到0.922。培训和运行ARU-Net的框架是开源的。
更新日期:2019-07-23
down
wechat
bug