Abstract
Wide research has been carried out for recognition of handwritten text on various languages that include Assamese, Bangla, English, Gujarati, Hindi, Marathi, Punjabi, Tamil etc. Recognition of multi-lingual text documents is still a challenge in the pattern recognition field. In this paper, a study of various features and classifiers for recognition of pre-segmented multi-lingual characters consisting of English, Hindi and Punjabi has been presented. In feature extraction phase, various techniques, namely, zoning features, diagonal features, horizontal peak extent based features and intersection and open end point based features are considered. In classification phase, three different classifiers, namely, k-NN, Linear-SVM, and MLP are attempted. Different combinations of various features and classifiers have been also performed. For script identification, we have achieved maximum accuracy of 92.89% using a combination of Linear-SVM, k-NN, and MLP classifiers, and for character recognition of English, Hindi and Punjabi, we have achieved a recognition accuracy of 92.18%, 84.67% and 86.79%, respectively.
Similar content being viewed by others
References
Bag S, Harit G (2013) A survey on optical character recognition for Bangla and Devanagari scripts. SADHANA 38(1):133–168
Govindaraju V, Setlur SR (2010) Guide to OCR for Indic scripts. Document recognition and retrieval series title advances in computer vision and pattern recognition. https://doi.org/10.1007/978-1-84800-330-9
Pandey A, Singh S, Kumar R, Tiwari A (2012) Handwritten script recognition using soft computing. Int J Adv Res Comput Sci Electron Eng 1(6):6–11
Rani R, Dhir R, Lehal GS (2013) Script identification of pre-segmented multi-font characters and digits. In: International conference on document analysis and recognition, pp 1150–1154
Kozielski M, Doetsch P, Hamdani M, Ney H (2014) Multilingual off-line handwriting recognition in real-world images. In: 11th IAPR international workshop on document analysis systems, pp 121–125
Kaur I, Mahajan S (2015) Bilingual script identification of printed text image. Int Res J Eng Technol 2(3):768–773
Surinta O, Karaaba MF, Schomaker LRB, Wiering MA (2015) Recognition of handwritten characters using local gradient feature descriptors. Eng Appl Artif Intell 45:405–414
Chakraborty D, Pal U (2016) Baseline detection of multi-lingual unconstrained handwritten text lines. Pattern Recognit Lett 74:74–81
Morillot O, Likforman-Sulem L, Grosicki E (2013) New baseline correction algorithm for text-line recognition with bidirectional recurrent neural networks. J Electron Imaging 22(2):1–11
Farulla GA, Murru N, Rossini R (2017) A fuzzy approach to segment touching characters. Expert Syst Appl 88:1–13
Mandal S, Prasanna SRM, Sundaram S (2018) GMM posterior features for improving online handwriting recognition. Expert Syst Appl 97:421–433
Kumar M, Jindal MK, Sharma RK, Jindal SR (2018) Character and numeral recognition for Non-Indic and Indic scripts: a survey. Artif Intell Rev, pp 1–27. https://doi.org/10.1007/s10462-017-9607-x
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, M., Jindal, S.R. A Study on Recognition of Pre-segmented Handwritten Multi-lingual Characters. Arch Computat Methods Eng 27, 577–589 (2020). https://doi.org/10.1007/s11831-019-09332-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11831-019-09332-0