Skip to main content
Log in

Recognition-based character segmentation for multi-level writing style

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Character segmentation is an important task in optical character recognition (OCR). The quality of any OCR system is highly dependent on character segmentation algorithm. Despite the availability of various character segmentation methods proposed to date, existing methods cannot satisfyingly segment characters belonging to some complex writing styles such as the Lanna Dhamma characters. In this paper, a new character segmentation method named graph partitioning-based character segmentation is proposed to address the problem. The proposed method can deal with multi-level writing style as well as touching and broken characters. It is considered as a generalization of existing approaches to multi-level writing style. The proposed method consists of three phases. In the first phase, a newly devised over-segmentation technique based on morphological skeleton is used to obtain redundant fragments of a word image. The fragments are then used to form a segmentation hypotheses graph. In the last phase, the hypotheses graph is partitioned into subgraphs each corresponding to a segmented character using the partitioning algorithm developed specifically for character segmentation purpose. Experimental results based on handwritten Lanna Dhamma characters datasets showed that the proposed method achieved high correct segmentation rate and outperformed existing methods for the Lanna Dhamma alphabet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Inkeaw, P., Chueaphun, C., Chaijaruwanich, J., Klomsae, A., Marukatat, S.: Lanna Dharma handwritten character recognition on palm leaves manuscript based on Wavelet transform. In: 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 19–21, pp 253–258 (2015)

  2. Inkeaw, P., Charoenkwan, P., Huang, H.-L., Marukatat, S., Ho, S.-Y., Chaijaruwanich, J.: Recognition of handwritten Lanna Dhamma characters using a set of optimally designed moment features. IJDAR 20(4), 259–274 (2017)

    Article  Google Scholar 

  3. Thammano, A., Pravesjit, S.: Recognition of archaic Lanna handwritten manuscripts using a hybrid bio-inspired algorithm. Memet. Comput. 7(1), 3–17 (2015)

    Article  Google Scholar 

  4. Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)

    Article  Google Scholar 

  5. Shi, Z., Govindaraju, V.: Segmentation and recognition of connected handwritten numeral strings. Pattern Recogn. 30(9), 1501–1504 (1997)

    Article  Google Scholar 

  6. Elnagar, A., Alhajj, R.: Segmentation of connected handwritten numeral strings. Pattern Recogn. 36(3), 625–634 (2003)

    Article  Google Scholar 

  7. Pal, U., Belad, A., Choisy, C.: Touching numeral segmentation using water reservoir concept. Pattern Recogn. Lett. 24(1–3), 261–272 (2003)

    Article  Google Scholar 

  8. Pravesjit, S., Thammano, A.: Segmentation of historical Lanna handwritten manuscripts. In: 2012 6th IEEE International Conference Intelligent Systems, 6–8, pp 332–337 (2012)

  9. Ribas, F.C., Oliveira, L.S., Britto, A.S., Sabourin, R.: Handwritten digit segmentation: a comparative study. IJDAR 16(2), 127–137 (2013)

    Article  Google Scholar 

  10. Kovalevski, V.A.: Character Readers and Pattern Recognition. Spartan Books, Washington (1968)

    Google Scholar 

  11. Casey, R.G., Nagy, G.: Recursive segmentation and classification of composite character patterns. In: Proceedings of Sixth International Conference on Pattern Recognition (1982)

  12. Elagouni, K., Garcia, C., Mamalet, F., Sebillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: 2012 10th IAPR International Workshop on Document Analysis Systems, 27–29, pp 120–124 (2012)

  13. Fujisawa, H., Nakano, Y., Kurino, K.: Segmentation methods for character recognition: from segmentation to document structure analysis. Proc. IEEE 80(7), 1079–1092 (1992)

    Article  Google Scholar 

  14. Xiu, P., Peng, L., Ding, X., Wang, H.: Offline handwritten Arabic character segmentation with probabilistic model. In: Bunke, H., Spitz, A.L. (eds) Proceedings of Document Analysis Systems VII: 7th International Workshop, DAS 2006, Nelson, New Zealand, February 13–15, 2006. Springer, Berlin, pp. 402–412 (2006)

  15. Oliveira, L.S., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Automatic recognition of handwritten numerical strings: a recognition and verification strategy. IEEE Trans. Pattern Anal. Mach. Intell. 24(11), 1438–1454 (2002)

    Article  Google Scholar 

  16. Chatchinarat, A.: Thai handwritten segmentation using proportional invariant recognition technique. In: 2009 International Conference on Future Computer and Communication, 3–5, pp. 283–287 (2009)

  17. Chen, Y.-K., Wang, J.-F.: Segmentation of single- or multiple-touching handwritten numeral string using background and foreground analysis. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1304–1317 (2000)

    Article  Google Scholar 

  18. Fenrich, R., Krishnamoorthy, K.: Segmentation diverse quality handwritten digit strings in near real-time. In: the 5th USPS Advance Technology Conference, pp. 523–537 (1990)

  19. Ji, J., Peng, L., Li, B.: Graph model optimization based historical Chinese character segmentation method. In: 2014 11th IAPR International Workshop on Document Analysis Systems, 7–10, pp 282–286 (2014)

  20. Stentiford, F.W.M., Mortimer, R.G.: Some new heuristics for thinning binary handprinted characters for OCR. IEEE Trans. Syst. Man Cybern. 13(1), 81–84 (1983)

    Article  Google Scholar 

  21. Jang, B.K., Chin, R.T.: One-pass parallel thinning: analysis, properties, and quantitative evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 14(11), 1129–1140 (1992)

    Article  Google Scholar 

  22. Chen, W., Sui, L., Xu, Z., Lang, Y.: Improved Zhang–Suen thinning algorithm in binary line drawing applications. In: 2012 International Conference on Systems and Informatics (ICSAI2012), 19–20, pp. 1947–1950 (2012)

  23. Juneam, N., Kantabutra, S.: Fast and efficient parallel coarsest refinement. Fundam. Inform. 150(2), 211–220 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Ping, Z., Lihui, C.: A novel feature extraction method and hybrid tree classification for handwritten numeral recognition. Pattern Recogn. Lett. 23(1), 45–56 (2002)

    Article  MATH  Google Scholar 

  25. Kamranian, Z., Monadjemi, S.A., Nematbakhsh, N.: A novel free format Persian/Arabic handwritten zip code recognition system. Comput. Electr. Eng. 39(7), 1970–1979 (2013)

    Article  Google Scholar 

  26. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: A hierarchical approach to recognition of handwritten Bangla characters. Pattern Recogn. 42(7), 1467–1484 (2009)

    Article  MATH  Google Scholar 

  27. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 25–25, vol. 881, pp. 886–893 (2005)

  28. Kim, J., Hwang, I., Kim, Y.-H., Moon, B.-R.: Genetic approaches for graph partitioning: a survey. In: Paper presented at the Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland (2011)

  29. Klomsae, A.: Image feature extraction for Lanna Dharma handwritten character recognition. Master Thesis, Chiang Mai University, Thailand (2012)

  30. Lanna Digital Archives. (2013) Chiang Mai University. http://library.cmu.ac.th/lanna_ebook/. Accessed 9 June 2016

  31. McLachlan, G.J.: Discriminant analysis and statistical pattern recognition. Wiley series in probability and mathematical statistics. Applied probability and statistics; Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, New York (1992)

  32. Haykin, S.S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)

    Google Scholar 

  33. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  34. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This study was funded under the Royal Golden Jubilee Ph.D. Program by the Thailand Research Fund (Grant No. PHD/0185/2556). We would like to thank Chiang Mai University, Thailand, for financial support and collection of digital Lanna archives. We also thank National Chiao Tung University, Taiwan, for supporting this work in the Intelligent Computing Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeerayut Chaijaruwanich.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 141 KB)

Supplementary material 2 (pdf 724 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Inkeaw, P., Bootkrajang, J., Charoenkwan, P. et al. Recognition-based character segmentation for multi-level writing style. IJDAR 21, 21–39 (2018). https://doi.org/10.1007/s10032-018-0302-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-018-0302-5

Keywords

Navigation