当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records
International Journal on Document Analysis and Recognition ( IF 1.8 ) Pub Date : 2018-07-26 , DOI: 10.1007/s10032-018-0309-y
Aurélie Lemaitre , Jean Camillerapp , Cérès Carton , Bertrand Coüasnon

This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.

中文翻译:

古代预印本中异质形式域定位的综合分析策略

本文讨论了手写字段在旧的预打印寄存器中的位置。图像带来了旧的和损坏的文档的困难,并且由于手写和印刷文字之间的巨大交互作用,我们还不得不面对提取文本的困难。此外,在许多馆藏中,表格的结构根据文件的来源而有所不同。这项工作被应用到墨西哥婚姻记录数据库中,该数据库已在2013年HIP研讨会上公布,以公开竞赛。在本文中,我们展示了已提交竞赛的经验方法的兴趣和局限性。然后,我们提出了一种方法,该方法将对文档内容的逻辑描述与对集合物理属性的自动分析结果结合在一起。这种分析的特殊性在于它不需要任何事实。我们证明了这种组合策略可以定位97.2%的手写字段。所提出的方法是可推广的,并且可以应用于其他数据库。
更新日期:2018-07-26
down
wechat
bug