Abstract
Script recognition has many real-life applications like optical character recognition, document archiving, writer identification, searching within the documents, etc. Automatic script recognition from multilingual documents is a stimulating task, where the system must identify and recognize several types of scripts that can be available on a single page. In offline script recognition, printed or handwritten documents are firstly scanned followed by the process of script recognition, whereas in online script recognition documents are already in soft-copy form. Most of the script recognition techniques presented by researchers so far are based on traditional image processing frameworks. But nowadays, it is observed that Deep Learning-based techniques are more capable of achieving a script recognition task efficiently as well as accurately. This paper provides a comprehensive survey of various techniques available for identification and recognition of multilingual scripts from the last few decades that are mainly focused on Indic scripts. However, some potential non-Indic script identification works are also incorporated for ease of understanding. We hope that this survey can act as a compendium as well as provide future directions to researchers for developing generic OCRs.
Similar content being viewed by others
References
Center for microprocessor application for training education and research (cmater. https://code.google.com/archive/p/cmaterdb/
Morphological image processing. https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing-html/topic4.htm
Ablavsky, V., Stevens, M.R.: Automatic feature selection with applications to script identification of degraded documents. In: ICDAR, pp. 750–754. Citeseer (2003)
Acharya, D.U., Gopakumar, R., Aithal, P.K.: Multi-script line identification system for indian languages. J. Comput. 2(11), 107–111 (2010)
Aithal, P.K., Rajesh, G., Acharya, D.U., Subbareddy, N.K.M.: Text line script identification for a tri-lingual document. In: 2010 Second International conference on Computing, Communication and Networking Technologies, pp. 1–3. IEEE (2010)
Angadi, S.A., Kodabagi, M.: A fuzzy approach for word level script identification of text in low resolution display board images using wavelet features. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1804–1811. IEEE (2013)
Ansari, G.J., Shah, J.H., Yasmin, M., Sharif, M., Fernandes, S.L.: A novel machine learning approach for scene text extraction. Future Gener. Comput. Syst. 87, 328–340 (2018)
Bashir, R., Quadri, S.: Identification of kashmiri script in a bilingual document image. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013), pp. 575–579. IEEE (2013)
Bashir, R., Quadri, S., Giri, K.J.: Script identification: a review. Int. J. Inf. Technol. pp. 1–15 (2018)
Benjelil, M., Kanoun, S., Mullot, R., Alimi, A.M.: Arabic and latin script identification in printed and handwritten types based on steerable pyramid features. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 591–595. IEEE (2009)
Benjelil, M., Mullot, R., Alimi, A.M.: Language and script identification based on steerable pyramid features. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 716–721. IEEE (2012)
Bhattacharya, U.: Indian scripts character database (isical). https://www.isical.ac.in/~ujjwal/download/database.html
Bhunia, A.K., Konwer, A., Bhunia, A.K., Bhowmick, A., Roy, P.P., Pal, U.: Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn. 85, 172–184 (2019)
Bhunia, A.K., Mukherjee, S., Sain, A., Bhunia, A.K., Roy, P.P., Pal, U.: Indic handwritten script identification using offline-online multi-modal deep network. Inf. Fusion 57, 1–14 (2020)
Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005). https://doi.org/10.1109/TPAMI.2005.227
Carbune, V., Gonnet, P., Deselaers, T., Rowley, H.A., Daryin, A., Calvo, M., Wang, L.L., Keysers, D., Feuz, S., Gervais, P.: Fast multi-language lstm-based online handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR) pp. 1–14 (2020)
Chanda, S., Franke, K., Pal, U.: Identification of indic scripts on torn-documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 713–717. IEEE (2011)
Chanda, S., Pal, S., Franke, K., Pal, U.: Two-stage approach for word-wise script identification. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 926–930. IEEE (2009)
Chanda, S., Pal, S., Pal, U.: Word-wise sinhala tamil and english script identification using gaussian kernel svm. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Chanda, S., Pal, U.: English, devanagari and urdu text identification. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 538–545. Citeseer (2005)
Chanda, S., Pal, U., Franke, K., Kimura, F.: Script identification–a han and roman script perspective. In: 2010 20th International Conference on Pattern Recognition, pp. 2708–2711. IEEE (2010)
Chanda, S., Pal, U., Kimura, F.: Identification of japanese and english script from a single document page. In: 7th IEEE International Conference on Computer and Information Technology (CIT 2007), pp. 656–661. IEEE (2007)
Chanda, S., Terrades, O.R., Pal, U.: Svm based scheme for thai and english script identification. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 1, pp. 551–555. IEEE (2007)
Chaudhari, S.A., Gulati, R.M.: An ocr for separation and identification of mixed english–gujarati digits using knn classifier. In: 2013 International Conference on Intelligent Systems and Signal Processing (ISSP), pp. 190–193. IEEE (2013)
Chaudhuri, B., Pal, U.: A complete printed bangla ocr system. Pattern Recogn. 31(5), 531–549 (1998)
Chaudhury, S., Sheth, R.: Trainable script identification strategies for indian languages. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR’99 (Cat. No. PR00318), pp. 657–660. IEEE (1999)
Choudhary, A., Ahlawat, S., Rishi, R., Dhaka, V.S.: Performance analysis of feed forward mlp with various activation functions for handwritten numerals recognition. In: 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), vol. 5, pp. 852–856. IEEE (2010)
Dalal, S., Malik, L.: A survey for feature extraction methods in handwritten script identification. Int. J. Simul. Syst. Sci. Technol. 10, 1–7 (2009)
Das, M.S., Rani, D.S., Reddy, C.: Heuristic based script identification from multilingual text documents. In: 2012 1st International Conference on Recent Advances in Information Technology (RAIT), pp. 487–492. IEEE (2012)
Das, N., Acharya, K., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: A benchmark image database of isolated bangla handwritten compound characters. IJDAR 17(4), 413–431 (2014)
Dhaka, V., et al.: Offline language-free writer identification based on speeded-up robust features. Int. J. Eng. 28(7), 984–994 (2015)
Dhandra, B., Hangarge, M.: Global and local features based handwritten text words and numerals script identification. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), vol. 2, pp. 471–475. IEEE (2007)
Dhandra, B., Mallikarjun, H., Hegadi, R., Malemath, V.: Word-wise script identification based on morphological reconstruction in printed bilingual documents (2006)
Dhandra, B., Nagabhushan, P., Hangarge, M., Hegadi, R., Malemath, V.: Script identification based on morphological reconstruction in document images. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 950–953. IEEE (2006)
Dhanya, D., Ramakrishnan, A.: Script identification in printed bilingual documents. In: International Workshop on Document Analysis Systems, pp. 13–24. Springer (2002)
Dongre, V.J., Mankar, V.H.: Development of comprehensive devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012, (2012)
Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Offline handwriting recognition on devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)
Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Towards spotting and recognition of handwritten words in indic scripts. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 32–37. IEEE (2018)
Ferrer, M.A., Morales, A., Pal, U.: Lbp based line-wise script identification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 369–373. IEEE (2013)
Ghosh, D., Dube, T., Shivaprasad, A.: Script recognition-a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Ghosh, R., Vamshi, C., Kumar, P.: Rnn based online handwritten word recognition in devanagari and bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)
Ghosh, S., Chaudhuri, B.B.: Composite script identification and orientation detection for indian text images. In: 2011 International Conference on Document Analysis and Recognition, pp. 294–298. IEEE (2011)
Gllavata, J., Freisleben, B.: Script recognition in images with complex backgrounds. In: Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005., pp. 589–594. IEEE (2005)
Gonzalez, R.C., Woods, R.E.: Digital image processing (2002)
Gopakumar, R., Subbareddy, N., Makkithaya, K., Acharya, D.U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010)
Guru, D., Ravikumar, M., Harish, B.: A review on offline handwritten script identification. Int. J. Comput. Appl. 975, 8878 (2012)
Halder, C., Obaidullah, S.M., Roy, K.: Offline writer identification from isolated characters using textural features. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 221–231. Springer (2016)
Hangarge, M., Dhandra, B.: Offline handwritten script identification in document images. Int. J. Comput. Appl. 4(6), 6–10 (2010)
Hangarge, M., Santosh, K., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 344–348. IEEE (2013)
Hiremath, P., Pujari, J.D., Shivashankar, S., Mouneswara, V.: Script identification in a handwritten document image using texture features. In: 2010 IEEE 2nd International Advance Computing Conference (IACC), pp. 110–114. IEEE (2010)
Hiremath, P.S., Shivashankar, S.: Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image. Pattern Recogn. Lett. 29(9), 1182–1189 (2008)
Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and language identification for handwritten document images. Int. J. Doc. Anal. Recogn. 2(2–3), 45–52 (1999)
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Jaeger, S., Ma, H., Doermann, D.: Identifying script on word-level with informational confidence. In: Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pp. 416–420. IEEE (2005)
Jindal, M., Hemrajani, N.: Script identification for printed document images at text-line level using dct and pca. IOSR J. Comput. Eng. 12(5), 97–102 (2013)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Elsevier (1994)
Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from indian documents. In: International Workshop on Document Analysis Systems, pp. 255–267. Springer (2006)
Juan Cheng, Xijian Ping, Guanwei Zhou, Yang Yang: Script identification of document image analysis. In: First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC’06), vol. 3, pp. 178–181 (2006). https://doi.org/10.1109/ICICIC.2006.518
Jundale, T.A., Hegadi, R.S.: Skew detection and correction of devanagari script using hough transform. Proc. Comput. Sci. 45, 305–311 (2015)
Jundale, T.A., Hegadi, R.S.: Skew detection of devanagari script using pixels of axes-parallel rectangle and linear regression. In: 2015 International Conference on Energy Systems and Applications, pp. 480–484. IEEE (2015)
Jundale, T.A., Hegadi, R.S.: Skew detection and correction of devanagari script using interval halving method. In: International Conference on Recent Trends in Image Processing and Pattern Recognition, pp. 28–38. Springer (2016)
Kanoun, S., Ennaji, A., LeCourtier, Y., Alimi, A.M.: Script and nature differentiation for arabic and latin text images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 309–313. IEEE (2002)
Keserwani, P., De, K., Roy, P.P., Pal, U.: Zero shot learning based script identification in the wild. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 987–992. IEEE (2019)
Khoddami, M., Behrad, A.: Farsi and latin script identification using curvature scale space features. In: 10th Symposium on Neural Network Applications in Electrical Engineering, pp. 213–217. IEEE (2010)
Krishnan, P., Jawahar, C.: Hwnet v2: an efficient word image representation for handwritten documents. IJDAR 22(4), 387–405 (2019)
Kumar, B., Bera, A., Patnaik, T.: Line based robust script identification for indianlanguages. Int. J. Inf. Electron. Eng. 2(2), 189 (2012)
Lee, D.S., Nohl, C.R., Baird, H.S.: Language identification in complex, unoriented, and degraded document images. In: Document Analysis Systems II, pp. 17–39. World Scientific (1998)
Li, L., Tan, C.L.: Script identification of camera-based images. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Lin, X.R., Guo, C.Y., Chang, F.: Classifying textual components of bilingual documents with decision-tree support vector machines. In: 2011 International Conference on Document Analysis and Recognition, pp. 498–502. IEEE (2011)
Lu, S., Tan, C.L.: Automatic detection of document script and orientation. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 1, pp. 237–241. IEEE (2007)
Luqman, H., Mahmoud, S.A., Awaida, S.: Kafd arabic font database. Pattern Recogn. 47(6), 2231–2240 (2014)
Ma, H., Doermann, D.S.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR, vol. 3, p. 968. Citeseer (2003)
Mahmoud, S.A., Ahmad, I., Alshayeb, M., Al-Khatib, W.G., Parvez, M.T., Fink, G.A., Märgner, V., El Abed, H.: Khatt: Arabic offline handwritten text database. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 449–454. IEEE (2012)
Mane, D., Kulkarni, U.: Visualizing and understanding customized convolutional neural network for recognition of handwritten marathi numerals. Proc. Comput. Sci. 132, 1123–1137 (2018)
Manjula, S., Hegadi, R.S.: A review on multilingual document analysis in indian context. In: 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 519–522. IEEE (2016)
Manjula, S., Hegadi, R.S.: Identification and classification of multilingual document using maximized mutual information. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 1679–1682. IEEE (2017)
Manjula, S., Hegadi, R.S.: Recognition of oriya and english languages based on lbp features. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–3. IEEE (2017)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Mohanty, S., Bebartta, H.D.: A novel approach for bilingual (english-oriya) script identification and recognition in a printed document. IJIP 4(2), 175 (2010)
Morera, Á., Sánchez, Á., Vélez, J.F., Moreno, A.B.: Gender and handedness prediction from offline handwriting using convolutional neural networks. Complexity 2018, (2018)
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of ocr research and development. Proc. IEEE 80(7), 1029–1058 (1992)
Moussa, S.B., Zahour, A., Benabdelhafid, A., Alimi, A.M.: Fractal-based system for arabic/latin, printed/handwritten script identification. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Namboodiri, A.M., Jain, A.K.: Online script recognition. In: Object recognition supported by user interaction for service robots, vol. 3, pp. 736–739. IEEE (2002)
Namboodiri, A.M., Jain, A.K.: Online handwritten script recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 124–130 (2004). https://doi.org/10.1109/TPAMI.2004.1261096
Nethravathi, B., Archana, C., Shashikiran, K., Ramakrishnan, A.G., Kumar, V.: Creation of a huge annotated database for tamil and kannada ohr. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 415–420. IEEE (2010)
Obaidullah, S.M., Das, N., Halder, C., Roy, K.: Indic script identification from handwritten document images–an unconstrained block-level approach. In: 2015 IEEE 2nd international conference on recent trends in information systems (ReTIS), pp. 213–218. IEEE (2015)
Obaidullah, S.M., Das, S.K., Roy, K.: A system for handwritten script identification from indian document. J. Pattern Recogn. Res. 8(1), 1–12 (2013)
Obaidullah, S.M., Goswami, C., Santosh, K., Das, N., Halder, C., Roy, K.: Separating indic scripts with matra for effective handwritten script identification in multi-script documents. Int. J. Pattern Recognit Artif Intell. 31(05), 1753003 (2017)
Obaidullah, S.M., Goswami, C., Santosh, K., Halder, C., Das, N., Roy, K.: Separating indic scripts with ‘matra’–a precursor to script identification in multi-script documents. In: Proceedings of International Conference on Computer Vision and Image Processing, pp. 205–214. Springer (2017)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Numeral script identification from handwritten document images. Proc. Comput. Sci. 54, 585–594 (2015)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A corpus of word-level offline handwritten numeral images from official indic scripts. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 703–711. Springer (2016)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A new dataset of word-level offline handwritten numeral images from four official indic scripts and its benchmarking using image transform fusion. Int. J. Intell. Eng. Inf. 4(1), 1–20 (2016)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Pwdb\_13: A corpus of word-level printed document images from thirteen official indic scripts. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 233–242. Springer (2016)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Visual analytic-based technique for handwritten indic script identification–a greedy heuristic feature fusion framework. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 211–219. Springer (2016)
Obaidullah, S.M., Halder, C., Santosh, K., Das, N., Roy, K.: Automatic line-level script identification from handwritten document images-a region-wise classification framework for indian subcontinent. Malays. J. Comput. Sci. 31(1), 63–84 (2018)
Obaidullah, S.M., Halder, C., Santosh, K., Das, N., Roy, K.: Phdindic\_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimedia Tools Appl. 77(2), 1643–1678 (2018)
Obaidullah, S.M., Karim, R., Shaikh, S., Halder, C., Das, N., Roy, K.: Transform based approach for indic script identification from handwritten document images. In: 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN), pp. 1–7. IEEE (2015)
Obaidullah, S.M., Roy, K., Das, N.: Comparison of different classifiers for script identification from handwritten document. In: 2013 IEEE International Conference on Signal Processing, Computing and Control (ISPCC), pp. 1–6. IEEE (2013)
Obaidullah, S.M., Santosh, K., Das, N., Halder, C., Roy, K.: Handwritten indic script identification in multi-script document images: a survey. Int. J. Pattern Recognit Artif Intell. 32(10), 1856012 (2018)
Obaidullah, S.M., Santosh, K., Halder, C., Das, N., Roy, K.: Automatic indic script identification from handwritten documents: page, block, line and word-level approach. Int. J. Mach. Learn. Cybernet. 10(1), 87–106 (2019)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Padma, M., Vijaya, P.: Identification of telugu devanagari and english scripts using discriminating. J. Comput. Sci. 1, 64–78 (2009)
Padma, M., Vijaya, P.: Monothetic separation of telugu, hindi and english text lines from a multi script document. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 4870–4875. IEEE (2009)
Padma, M., Vijaya, P.: Entropy based texture features useful for automatic script identification. Int. J. Comput. Sci. Eng. 2(02), 115–120 (2010)
Padma, M., Vijaya, P.: Global approach for script identification using wavelet packet based features. Int. J. Signal Process. Image Process. Pattern Recogn. 3(3), 29–40 (2010)
Padma, M., Vijaya, P.: Script identification from trilingual documents using profile based features. IJCSA 7(4), 16–33 (2010)
Padma, M., Vijaya, P.: Wavelet packet based texture features for automatic script identification. Int. J. Image Process 4(1), 53–65 (2010)
Pal, U., Belaıd, A., Choisy, C.: Touching numeral segmentation using water reservoir concept. Pattern Recogn. Lett. 24(1–3), 261–272 (2003)
Pal, U., Chaudhuri, B.: Automatic separation of words in multi-lingual multi-script indian documents. In: Proceedings of the fourth international conference on document analysis and recognition, vol. 2, pp. 576–579. IEEE (1997)
Pal, U., Chaudhuri, B.: Automatic identification of english, chinese, arabic, devnagari and bangla script line. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 790–794. IEEE (2001)
Pal, U., Chaudhuri, B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)
Pal, U., Chaudhuri, B.: Script line separation from indian multi-script documents. IETE J. Res. 49(1), 3–11 (2003)
Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. pattern Recognition 37(9), 1887–1899 (2004)
Pal, U., Roy, R.K., Roy, K., Kimura, F.: Indian multi-script full pin-code string recognition for postal automation. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 456–460 (2009). https://doi.org/10.1109/ICDAR.2009.171
Pal, U., Sarkar, A.: Recognition of printed urdu script. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp. 1183–1187. Citeseer (2003)
Pal, U., Sharma, N., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition of six popular indian scripts. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 749–753. IEEE (2007)
Pal, U., Sinha, S., Chaudhuri, B.: Multi-script line identification from indian documents. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp. 880–884. IEEE (2003)
Pan, J., Tang, Y.: A rotation-robust script identification based on bemd and lbp. In: 2011 International Conference on Wavelet Analysis and Pattern Recognition, pp. 165–170. IEEE (2011)
Pan, W., Suen, C.Y., Bui, T.D.: Script identification using steerable gabor filters. In: Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pp. 883–887. IEEE (2005)
Pati, P.B., Raju, S.S., Pati, N., Ramakrishnan, A.: Gabor filters for document analysis in indian bilingual documents. In: International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of, pp. 123–126. IEEE (2004)
Pati, P.B., Ramakrishnan, A.: Hvs inspired system for script identification in indian multi-script documents. In: International Workshop on Document Analysis Systems, pp. 380–389. Springer (2006)
Pati, P.B., Ramakrishnan, A.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)
Patil, S.B., Subbareddy, N.: Neural network based system for script identification in indian documents. Sadhana 27(1), 83–97 (2002)
Peake, G., Tan, T.: Script and language identification from document images. In: Proceedings Workshop on Document Image Analysis (DIA’97), pp. 10–17. IEEE (1997)
Peng, L., Liu, C., Ding, X., Wang, H.: Multilingual document recognition research and its application in china. In: Second International Conference on Document Image Analysis for Libraries (DIAL’06), pp. 7–pp. IEEE (2006)
Phan, T.Q., Shivakumara, P., Ding, Z., Lu, S., Tan, C.L.: Video script identification based on text lines. In: 2011 International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2011)
Philip, B., Samuel, R.S.: A novel bilingual ocr for printed malayalam-english text based on gabor features and dominant singular values. In: 2009 International Conference on Digital Image Processing, pp. 361–365. IEEE (2009)
Plamondon, R., Lorette, G.: Automatic signature verification and writer identification-the state of the art. Pattern Recogn. 22(2), 107–131 (1989)
Rabby, A.S.A., Haque, S., Islam, S., Abujar, S., Hossain, S.A.: Bornonet: Bangla handwritten characters recognition using convolutional neural network. Proc. Comput. Sci. 143, 528–535 (2018)
Raghunandan, K., Shivakumara, P., Roy, S., Kumar, G.H., Pal, U., Lu, T.: Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans. Circuits Syst. Video Technol. 29(4), 1145–1162 (2018)
Rai, H., Yadav, A.: Iris recognition using combined support vector machine and hamming distance approach. Expert Syst. Appl. 41(2), 588–593 (2014)
Rajput, G., Anita, H.: Handwritten script recognition at line level-a multiple feature based approach. Int. J. Eng. Innov. Technol. 3(4), 90–95 (2013)
Ramteke, A.S., Rane, M.E.: A survey on offline recognition of handwritten devanagari script. Int. J. Sci. Eng. Res. 3(5), (2012)
Rani, R., Dhir, R., Lehal, G.S.: Performance analysis of feature extractors and classifiers for script recognition of english and gurmukhi words. In: Proceeding of the workshop on Document Analysis and Recognition, pp. 30–36 (2012)
Rani, R., Dhir, R., Lehal, G.S.: Script identification of pre-segmented multi-font characters and digits. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1150–1154. IEEE (2013)
Rao, G.S., Imanuddin, M., Harikumar, B.: Script identification of telugu, english and hindi document image. Int. J. Adv. Eng. Global Technol 2(2), 443–452 (2014)
Razzak, M.I., Hussain, S., Sher, M.: Numeral recognition for urdu script in unconstrained environment. In: 2009 International Conference on Emerging Technologies, pp. 44–47. IEEE (2009)
Rezaee, H., Geravanchizadeh, M., Razzazi, F.: Automatic language identification of bilingual english and farsi scripts. In: 2009 International Conference on Application of Information and Communication Technologies, pp. 1–4. IEEE (2009)
Roy, K., Alaei, A., Pal, U.: Word-wise handwritten persian and roman script identification. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 628–633. IEEE (2010)
Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for indian postal automation. In: Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004., pp. 266–271 (2004). https://doi.org/10.1109/INDICO.2004.1497753
Roy, K., Das, S.K., Obaidullah, S.M.: Script identification from handwritten document. In: 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 66–69. IEEE (2011)
Roy, K., Majumder, K.: Trilingual script separation of handwritten postal document. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 693–700. IEEE (2008)
Roy, K., Pal, U., Chaudhuri, B.: Neural network based word-wise handwritten script identification system for indian postal automation. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005., pp. 240–245. IEEE (2005)
Roy, P.P.: Center for visual information technology (cvit) - international institute of information technology, gachibowli, hyderabad. https://cvit.iiit.ac.in/research/resources
Roy, P.P.: Pattern recognition, image processing and machine learning (parimal) iit roorkee. http://parimal.iitr.ac.in/dataset
Saïdani, A., Echi, A.K., Belaid, A.: Identification of machine-printed and handwritten words in arabic and latin scripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 798–802. IEEE (2013)
Saidani, A., Kacem, A., Belaid, A.: Co-occurrence matrix of oriented gradients for word script and nature identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 16–20. IEEE (2015)
Samanta, O., Roy, A., Parui, S.K., Bhattacharya, U.: An hmm framework based on spherical-linear features for online cursive handwriting recognition. Inf. Sci. 441, 133–151 (2018)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Word level script identification from bangla and devanagri handwritten texts mixed with roman script. arXiv preprint arXiv:1002.4007 (2010)
Sharma, M.K., Dhaka, V.P.: Offline scripting-free author identification based on speeded-up robust features. International Journal on Document Analysis and Recognition (IJDAR) 18(4), 303–316 (2015)
Sharma, M.K., Dhaka, V.P.: Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network. Neural Comput. Appl. 27(7), 1817–1829 (2016)
Sharma, M.K., Dhaka, V.P.: Segmentation of english offline handwritten cursive scripts using a feedforward neural network. Neural Comput. Appl. 27(5), 1369–1379 (2016)
Sharma, N., Chanda, S., Pal, U., Blumenstein, M.: Word-wise script identification from video frames. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 867–871 (2013). https://doi.org/10.1109/ICDAR.2013.177
Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Bag-of-visual words for word-wise video script identification: A study. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)
Sharma, N., Pal, U., Blumenstein, M.: A study on word-level multi-script identification from video frames. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1827–1833. IEEE (2014)
Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M., Tan, C.L.: A new method for word segmentation from arbitrarily-oriented video text lines. In: 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 1–8. IEEE (2012)
Shi, B., Bai, X., Yao, C.: Script identification in the wild via discriminative convolutional neural network. Pattern Recogn. 52, 448–458 (2016)
Shi, B., Yao, C., Zhang, C., Guo, X., Huang, F., Bai, X.: Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 531–535. IEEE (2015)
Shivakumara, P., Sharma, N., Pal, U., Blumenstein, M., Tan, C.L.: Gradient-angular-features for word-wise video script identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 3098–3103. IEEE (2014)
Shivakumara, P., Yuan, Z., Zhao, D., Lu, T., Tan, C.L.: New gradient-spatial-structural features for video script identification. Comput. Vis. Image Underst. 130, 35–53 (2015)
Singh, M.P., Dhaka, V.: Handwritten character recognition using modified gradient descent technique of neural networks and representation of conjugate descent for training patterns. International Journal of Engineering pp. 145–158 (2009)
Singh, P.K., Chatterjee, I., Sarkar, R.: Page-level handwritten script identification using modified log-gabor filter based features. In: 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), pp. 225–230. IEEE (2015)
Singh, P.K., Sarkar, R., Nasipuri, M.: Offline script identification from multilingual indic-script documents: a state-of-the-art. Computer Science Review 15, 1–28 (2015)
Singhal, V., Navin, N., Ghosh, D.: Script-based classification of hand-written text documents in a multilingual environment. In: Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation, pp. 47–54. IEEE (2003)
Sinha, S., Pal, U., Chaudhuri, B.: Word–wise script identification from indian documents. In: International Workshop on Document Analysis Systems, pp. 310–321. Springer (2004)
Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)
Thadchanamoorthy, S., Kodikara, N., Premaretne, H., Pal, U., Kimura, F.: Tamil handwritten city name database development and recognition for postal automation. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 793–797. IEEE (2013)
Tsai, M.J., Tao, Y.H., Yuadi, I.: Deep learning for printed document source identification. Sig. Process. Image Commun. 70, 184–198 (2019)
Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., Yibulayin, T.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)
Ukil, S., Ghosh, S., Obaidullah, S.M., Santosh, K., Roy, K., Das, N.: Deep learning for word-level handwritten indic script identification. arXiv preprint arXiv:1801.01627 (2018)
Wang, X.Y., Wang, Q.Y., Yang, H.Y., Bu, J.: Color image segmentation using automatic pixel classification with support vector machine. Neurocomputing 74(18), 3898–3911 (2011)
Xing, L., Qiao, Y.: Deepwriter: A multi-stream deep cnn for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)
Zheng, Y., Iwana, B.K., Uchida, S.: Mining the displacement of max-pooling for text recognition. Pattern Recogn. 93, 558–569 (2019)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sinwar, D., Dhaka, V.S., Pradhan, N. et al. Offline script recognition from handwritten and printed multilingual documents: a survey. IJDAR 24, 97–121 (2021). https://doi.org/10.1007/s10032-021-00365-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-021-00365-5