Skip to main content
Log in

A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://www-intuidoc.irisa.fr/hip_db/.

References

  1. Adam, P., Knibbe, M., Bernard, A.L., Mtaireau, P.Y.: ICDAR 2013 HIP workshop familysearch competition A2ia submission. In: Historical Image Processing (HIP) (2013)

  2. Barlas, P., Adam, S., Chatelain, C., Paquet, T.: A typed and handwritten text block segmentation system for heterogeneous and complex documents. In: DAS’14 (2014)

  3. Cannaday, A.B., Gehring, J.: ICDAR 2015 HIP workshop familysearch competition capstone summary. In: Historical Image Processing (HIP) (2013)

  4. Carton, C., Lemaitre, A., Coüasnon, B.: Eyes wide open: an interactive learning method for the design of rule-based systems. IJDAR 63, 411–411 (2017)

    Google Scholar 

  5. Coüasnon, B.: DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. IJDAR 8(2), 111–122 (2006)

    Article  Google Scholar 

  6. Coüasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: International Conference on Document Image Analysis for Libraries (DIAL), pp. 270–277 (2004)

  7. Fred, A.L., Jain, A.K.: Data clustering using evidence accumulation. In: International Conference on Pattern Recognition (ICPR), vol. 4, pp. 276–280 (2002)

  8. Garris, M.D.: Evaluating spatial correspondence of zones in document recognition systems. In: International Conference on Image Processing (ICIP), pp. 304–307 (1995)

  9. Garz, A., Sablatnig, R., Diem, M.: Layout analysis for historical manuscripts using sift features. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 508–512 (2011)

  10. Guichard, L., Chazalon, J., Coüasnon, B.: Exploiting collection level for improving assisted handwritten words transcription of historical documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 875–879 (2011)

  11. Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Automatic processing of handwritten bank cheque images: a survey. IJDAR 15(4), 267–296 (2012)

    Article  Google Scholar 

  12. Kooli, N., Belad, A.: Semantic label and structure model based approach for entity recognition in database context. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 301–305 (2015)

  13. Lemaitre, A., Camillerapp, J.: HIP 2013 familysearch competition—contribution of IRISA. In: Historical Image Processing (HIP) (2013)

  14. Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation improves document structure recognition. IJDAR 11(2), 97–109 (2008)

    Article  Google Scholar 

  15. Leplumey, I., Camillerapp, J., Queguiner, C.: Kalman filter contributions towards document segmentation. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 765–769 (1995)

  16. Liang, J., Doermann, D.: Logical labeling of document images using layout graph matching with adaptive learning. In: Document Analysis Systems (DAS), pp. 224–235 (2002)

    Google Scholar 

  17. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  18. Mas, J., Fornés, A., Lladós, J.: An interactive transcription system of census records using word-spotting based information transfer. In: Document Analysis Systems (DAS), pp. 54–59 (2016)

  19. Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: learning where to start and when to stop. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 871–876. Kyoto, Japan (2017)

  20. Nielson, H.E., Barrett, W.A.: Consensus-based table form recognition of low-quality historical documents. IJDAR 8(2–3), 183–200 (2006)

    Article  Google Scholar 

  21. Nion, T., Menasri, F., Louradour, J., Sibade, C., Retornaz, T., Metaireau, P., Kermorvant, C.: Handwritten information extraction from historical census documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 822–826 (2013)

  22. Pham, T.A., Alaei, A.: ICDAR 2013 HIP workshop family search competition: a multi-scale image analysis approach for historical document image classification. In: Historical Image Processing (HIP) (2013)

  23. Richarz, J., Vajda, S., Fink, G.A.: Towards semi-supervised transcription of handwritten historical weather reports. In: Document Analysis Systems (DAS), pp. 180–184 (2012)

  24. Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46(6), 1658–1669 (2013)

    Article  Google Scholar 

  25. Sibade, C., Retornaz, T., Nion, T., Lerallut, R., Kermorvant, C.: Automatic indexing of French handwritten census registers for probate geneaology. In: Historical Image Processing (HIP), pp. 51–58 (2011)

  26. Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Historical Document Processing (HIP), pp. 101–106 (2017)

  27. Ye, X., Cheriet, M., Suen, C.Y.: A generic method of cleaning and enhancing handwritten data from business forms. IJDAR 4(2), 84–96 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aurélie Lemaitre.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lemaitre, A., Camillerapp, J., Carton, C. et al. A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records. IJDAR 21, 269–282 (2018). https://doi.org/10.1007/s10032-018-0309-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-018-0309-y

Keywords

Navigation