Abstract
Takri is an Indian regional class of scripts, used in hilly areas of north-west India which include Jammu and Kashmir (J & K), Himachal Pradesh (H.P.), Punjab and Uttarakhand. This script has immense variations; almost 13 identified in the whole region of North-west India. It has been observed that no work for text identification and recognition of Takri script has been done so far. Therefore, our work focuses on identifying and classifying the various challenges in the script based on comparative analysis of existing text segmentation approaches, as correct segmentation of text leads to more accurate machine recognition. As there were no metal fonts available for the script, it is required to collect the machine-printed form of data for solving the text identification problem in Takri script. The paper surveys for different text segmentation approaches and based on the structural properties of the script, shows an implementation of these on Takri data in three steps- Gurmukhi segmentation technique, Connected Component segmentation approach, and Gurmukhi touching characters segmentation approach. Results are analyzed for Segmentation Accuracy and Challenges are identified along with their statistical analysis. Further, the challenges identified as half- forms, numerous types of touching characters, overlapping bounding boxes, are classified. The effectiveness of these challenges was evaluated using Naïve-Bayesian machine learning algorithm. The results showed 80% accuracy in text identification and classification of Takri script.
Similar content being viewed by others
References
Fujisawa, Hiromichi, Yasuaki Nakano and Kiyomichi Kurino 1992 “Segmentation methods for character recognition: from segmentation to document structure analysis.” Proceedings of the IEEE 80(7): pp 1079–1092
Govindan V K and Shivaprasad A P 1990 “Character recognition—a review.” Pattern Recognition 23(7): pp 671–683
Mantas J 1986 “An overview of character recognition methodologies.” Pattern recognition 19(6): pp 425-430
Kumar Sesh K S, Anoop M Namboodiri and Jawahar C V 2006 “Learning segmentation of documents with complex scripts.” Computer Vision, Graphics and Image Processing. Springer, Berlin, Heidelberg, pp 749-760
Obaidullah S K Md et al. 2014 “Script identification from printed Indian document images and performance evaluation using different classifiers.” Applied Computational Intelligence and Soft Computing 2014”, p 22
Mule, Gun. “akara. 1974. к к [The Story of Indian Scripts].” Dillı: Rajakamala Prakasana
Ghosh D, Dube T and Shivaprasad A 2010, “Script Recognition—A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12): pp. 2142-2161
Lehal, Gurpreet S 2009 “A Complete Machine-Printed Gurmukhi OCR System.” Guide to OCR for Indic Scripts. Springer, pp 43-71 London
Lakshmi Cc Vasanthaand Patvardhan C 2004”An optical character recognition system for printed Telugu text.” Pattern Analysis and Applications 7(2): pp 190-204
Saeeda Naz et al. “The optical character recognition of Urdu-like cursive scripts.” Pattern Recognition 47.3, pp 1229-1248, 2014.
Chaudhuri B B and PalU1997 “An OCR system to read two Indian language scripts: Bangla and Devanagari (Hindi).” Document Analysis and Recognition, Proceedings of the Fourth International Conference on. Vol. 2. IEEE
Mohanty, Sanghamitra and Hemanta Kumar Behera 2004 “A complete OCR development system for Oriya script.” Proceedings of SIMPLE 4,
Kunte RSanjeev and Sudhaker Samuel R D 2007”A simple and efficient optical character recognition system for basic symbols in printed Kannada text.” Sadhana 32(5):521
Roy Partha Pratim et al. 2016”HMM-based Indic handwritten word recognition using zone segmentation.” Pattern Recognition 60”, pp. 1057-1075
Pal U and Chaudhuri B B 2004 “Indian script character recognition: a survey.” Pattern Recognition 37(9):pp 1887-1899
Umapada Pal, Ramachandran Jayadevan and Nabin Sharma 2012 Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques. 11, 1, Article 1
Ishida, Richard 2002”An introduction to Indic scripts.” In: Proceedings of the 22nd Int. Unicode Conference
Pandey and Anshuman. 2009 Proposal to Encode the Takri Script in ISO/IEC 10646. Vol. 2. L2/09-424). http://www.unicode.org
Vogel J Ph. 1911”Antiquities of Chamba State, I, Calcutta,” pi. XXXI: 218
Chhabra, B Ch. 1957 “Antiquities of Chamba State, Part II, Medieval and Later Inscriptions.” Memoirs of the Archaeological Survey of India. New Delhi, Government of India Press,
Charak, Sukh Dev Singh and Maharaja Ranbir Singh. 1985 Life and Times of Maharaja Ranbir Singh, 1830-1885. Jay and Kay Book House
Shivanath 1997 Two Decades of Dogri Literature. Sahitya Akademi, New Delhi
Kaul P K 2001 Antiquities of the Chenāb Valley in Jammu: inscriptions-copper plates, sanads, grants, firmāns and letters in Brāhmi-Shārdā-Tākri-Persian and Devnāgri scripts. Eastern Book Linkers, Delhi
Pandey and Anshuman. 2015 Preliminary Proposal to Encode the Dogra Script in Unicode. Vol. 2. L2/15-213). http://www.unicode.org
Pandey and Anshuman. 2010 A Roadmap for Scripts of the Landa Family. No. 3766. N3766 L2/10-011R. February 9, 2010. http://std.dkuug.dk/JTC1/SC2/WG2/docs
Casey, Richard G and Eric Lecolinet 1996 “A survey of methods and strategies in character segmentation.” IEEE transactions on Pattern Analysis and Machine Intelligence 18(7):, pp 690-706
Lehal G Sand Chandan Singh 2001 “A technique for segmentation of Gurmukhi text.” International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg
Jindal, Manish Kumar, Rajendra Kumar Sharma and Gurpreet Singh Lehal 2007”A study of different kinds of degradation in printed Gurmukhi script.” Computing: Theory and Applications, 2007. ICCTA’07. International Conference on. IEEE,
Jindal M K, Lehal G S and . Sharma R K. 2005 “Segmentation problems and solutions in printed Degraded Gurmukhi Script.” International Journal of Signal Processing 2(4): pp 258-267
Jindal, Manish Kumar, Rajendra Kumar Sharma and Gurpreet Singh Lehal 2009 “Segmentation of touching characters in upper zone in printed Gurmukhi script.” Proceedings of the 2nd Bangalore Annual Compute Conference. ACM,
Lehal, Gurpreet S and Chandan Singh 1999 “Feature extraction and classification for OCR of Gurmukhi script.” VIVEK-BOMBAY- 12(2): pp 2-12
Kumar, Rajiv and Amardeep Singh 2011 “Character Segmentation in Gurmukhi Handwritten Text using Hybrid Approach.” International Journal of Computer Theory and Engineering 3(4):392
Sharma, Rajiv K and Amardeep Singh 2008”Segmentation of Handwritten Text in Gurmukhi Script.” International Journal of Image Processing 2(3): pp. 12-17
Kumar, Munish M K Jindal and Sharma R K 2014 “Segmentation of isolated and touching characters in offline handwritten Gurmukhi script recognition.” International Journal of Information Technology and Computer Science 6(2): pp. 58-63
Kaur, Davinder and Rupinder Kaur Gurm 2016 “Machine Printed Gurmukhi Numerals Recognition using Convolutional Neural Networks” International Journal of Technology and Computing (IJTC). Vol. 2. No. 8 (August, 2016). Techlive Solutions, 2016
Singh, Pritpal and Sumit Budhiraja 2011 “Feature extraction and classification techniques in OCR systems for handwritten Gurmukhi Script–a survey.” International Journal of Engineering Research and Applications (IJERA) 1(4): pp. 1736-1739
Davessar, Neena Madan, Sunil Madan and Hardeep Singh 2003 “A hybrid approach to character segmentation of Gurmukhi script characters.” Applied Imagery Pattern Recognition Workshop, 2003. Proceedings. 32nd. IEEE
Kaur, Sandeep and Rekha Bhatia 2016 “Gurmukhi Printed Character Recognition using Hierarchical Centroid Method and SVM.” International Journal of Computer Applications 149(3):
Liang, Su, Malayappan Shridhar and Majid Ahmadi 1994 “Segmentation of touching characters in printed document recognition.”
Kahan, Simon, Theo Pavlidis and Henry S Baird 1987”On the recognition of printed characters of any font and size.” IEEE Transactions on Pattern Analysis and Machine Intelligence 2: pp 274-288
Tsujimoto, Shuichi, and Haruo Asada 1992”Resolving ambiguity in segmenting touching characters.” Structured Document Image Analysis. Springer, Berlin, Heidelberg, pp 203-215
Bose, Chinmoy B and Shyh-Shiaw Kuo 1994”Connected and degraded text recognition using hidden Markov model.” Pattern Recognition 27(10): pp 1345-1363
Acknowledgements
We thank Padmashri Vijay Sharma, Artist, and Takri expert, Bhuri Singh Museum, Chamba, H.P. for sharing his valuable expertise on this rare script and providing immense help for gathering rare books/material written in the script. We also thank Dr Shiv Nirmohi, a Dogri writer and Dr Sangeeta Sharma, State Archival Department, Jammu for providing all kinds of support needed for researching Takri script.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Magotra, S., Kaushik, B. & Kaul, A. A Comparative analysis for identification and classification of text segmentation challenges in Takri Script. Sādhanā 45, 146 (2020). https://doi.org/10.1007/s12046-020-01384-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-020-01384-4