Abstract
This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.
Similar content being viewed by others
References
Adam, P., Knibbe, M., Bernard, A.L., Mtaireau, P.Y.: ICDAR 2013 HIP workshop familysearch competition A2ia submission. In: Historical Image Processing (HIP) (2013)
Barlas, P., Adam, S., Chatelain, C., Paquet, T.: A typed and handwritten text block segmentation system for heterogeneous and complex documents. In: DAS’14 (2014)
Cannaday, A.B., Gehring, J.: ICDAR 2015 HIP workshop familysearch competition capstone summary. In: Historical Image Processing (HIP) (2013)
Carton, C., Lemaitre, A., Coüasnon, B.: Eyes wide open: an interactive learning method for the design of rule-based systems. IJDAR 63, 411–411 (2017)
Coüasnon, B.: DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. IJDAR 8(2), 111–122 (2006)
Coüasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: International Conference on Document Image Analysis for Libraries (DIAL), pp. 270–277 (2004)
Fred, A.L., Jain, A.K.: Data clustering using evidence accumulation. In: International Conference on Pattern Recognition (ICPR), vol. 4, pp. 276–280 (2002)
Garris, M.D.: Evaluating spatial correspondence of zones in document recognition systems. In: International Conference on Image Processing (ICIP), pp. 304–307 (1995)
Garz, A., Sablatnig, R., Diem, M.: Layout analysis for historical manuscripts using sift features. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 508–512 (2011)
Guichard, L., Chazalon, J., Coüasnon, B.: Exploiting collection level for improving assisted handwritten words transcription of historical documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 875–879 (2011)
Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Automatic processing of handwritten bank cheque images: a survey. IJDAR 15(4), 267–296 (2012)
Kooli, N., Belad, A.: Semantic label and structure model based approach for entity recognition in database context. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 301–305 (2015)
Lemaitre, A., Camillerapp, J.: HIP 2013 familysearch competition—contribution of IRISA. In: Historical Image Processing (HIP) (2013)
Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation improves document structure recognition. IJDAR 11(2), 97–109 (2008)
Leplumey, I., Camillerapp, J., Queguiner, C.: Kalman filter contributions towards document segmentation. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 765–769 (1995)
Liang, J., Doermann, D.: Logical labeling of document images using layout graph matching with adaptive learning. In: Document Analysis Systems (DAS), pp. 224–235 (2002)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mas, J., Fornés, A., Lladós, J.: An interactive transcription system of census records using word-spotting based information transfer. In: Document Analysis Systems (DAS), pp. 54–59 (2016)
Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: learning where to start and when to stop. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 871–876. Kyoto, Japan (2017)
Nielson, H.E., Barrett, W.A.: Consensus-based table form recognition of low-quality historical documents. IJDAR 8(2–3), 183–200 (2006)
Nion, T., Menasri, F., Louradour, J., Sibade, C., Retornaz, T., Metaireau, P., Kermorvant, C.: Handwritten information extraction from historical census documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 822–826 (2013)
Pham, T.A., Alaei, A.: ICDAR 2013 HIP workshop family search competition: a multi-scale image analysis approach for historical document image classification. In: Historical Image Processing (HIP) (2013)
Richarz, J., Vajda, S., Fink, G.A.: Towards semi-supervised transcription of handwritten historical weather reports. In: Document Analysis Systems (DAS), pp. 180–184 (2012)
Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46(6), 1658–1669 (2013)
Sibade, C., Retornaz, T., Nion, T., Lerallut, R., Kermorvant, C.: Automatic indexing of French handwritten census registers for probate geneaology. In: Historical Image Processing (HIP), pp. 51–58 (2011)
Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Historical Document Processing (HIP), pp. 101–106 (2017)
Ye, X., Cheriet, M., Suen, C.Y.: A generic method of cleaning and enhancing handwritten data from business forms. IJDAR 4(2), 84–96 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lemaitre, A., Camillerapp, J., Carton, C. et al. A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records. IJDAR 21, 269–282 (2018). https://doi.org/10.1007/s10032-018-0309-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-018-0309-y