Abstract
This paper introduces a large-scale Meitei Mayek handwritten character database. It consists of the complete character set of the script. There are a total of 85,124 character images of 55 character classes with 72,330 and 12,794 images in training and test sets, respectively. The present work focuses on collecting the natural handwriting of individuals by carrying out sample collection in two phases: (a) unconstrained handwriting in the form of answer sheets and classroom notes and (b) tabular forms. A total of nearly 500 individuals have contributed in the development of the database. Recognition of the character images in the database is carried out using different feature descriptors with four popular classifiers, namely KNN, Linear Support Vector Classifier, Random Forest and Support Vector Machine. The paper also proposes a convolutional neural network (CNN) model by enhancing a base CNN architecture by optimally tuning the hyperparameters. Experimental results show that the CNN model can be benchmarked against the concerned database with a test accuracy of 95.56%.
Similar content being viewed by others
Notes
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). URL http://tensorflow.org/. Software available from tensorflow.org
Achom, A., Basu, A.: Design and evaluation of unicode compliance Meitei/Meetei Mayek keyboard layout. In: 2015 International Symposium on Advanced Computing and Communication (ISACC), pp. 90–97. IEEE (2015)
Al-Ma’adeed, S., Elliman, D., Higgins, C.A.: A data base for Arabic handwritten text recognition research. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 485–489. IEEE (2002)
Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten Arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)
Alaei, A., Nagabhushan, P., Pal, U.: A benchmark Kannada handwritten document dataset and its segmentation. In: 2011 International Conference on Document Analysis and Recognition, pp. 141–145. IEEE (2011)
Alom, M.Z., Sidike, P., Taha, T.M., Asari, V.K.: Handwritten Bangla digit recognition using deep learning (2017). arXiv preprint arXiv:1705.02680
Ashiquzzaman, A., Tushar, A.K.: Handwritten Arabic numeral recognition using deep learning neural networks. In: 2017 IEEE International Conference on Imaging, Vision and Pattern Recognition (icIVPR), pp. 1–4. IEEE (2017)
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer (2012)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1(2), 119–130 (1988)
Graps, A.: An introduction to wavelets. IEEE Comput. Sci. Eng. 2(2), 50–61 (1995)
Grother, P.J.: NIST special database 19. NIST handprinted forms and characters database. Tech. rep., No. World Wide Web-Internet and Web Information Systems (2016)
Hijam, D., Saharia, S.: Convolutional neural network based Meitei Mayek handwritten character recognition. In: International Conference on Intelligent Human Computer Interaction, pp. 207–219. Springer (2018)
Hijam, D., Saharia, S.: Comparative study of different classification models on benchmark dataset of handwritten meitei mayek characters. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 61–71. Springer (2020)
Hijam, D., Saharia, S., Nirmal, Y.: Towards a complete character set Meitei Mayek handwritten character recognition. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–5. IEEE (2018)
Ian, G., Yoshua, B.: Deep learning (adaptive computation and machine learning) (2016)
Inunganbi, S., Choudhary, P.: Recognition of handwritten meitei mayek script based on texture feature. Int. J. Nat. Lang. Comput. (IJNLC) 7(5), 99–108 (2018)
Inunganbi, S., Choudhary, P., Manglem, K.: Manipuri handwritten character recognition by convolutional neural network. In: International Conference on Computer Vision and Image Processing, pp. 307–318. Springer (2019)
Inunganbi, S., Choudhary, P., Manglem, K.: Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition. In: The Visual Computer, pp. 1–15 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
KIM, D.H., Hwang, Y.S., Park, S.T., Kim, E.J., Paek, S.H., BANG, S.Y.: Handwritten Korean character image database pe92. IEICE Trans. Inf. Syst. 79(7), 943–950 (1996)
Kim, I.J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)
Kshetrimayum, N.: A comparative study of Meetei Mayek: from the inscribed letterform to the digital typeface. Unpublished Masters Dissertation. University of Reading. Reading, UK pp. 1–5 (2010)
Kumar, C.J., Kalita, S.K.: Recognition of handwritten numerals of Manipuri script. Int. J. Comput. Appl. 84(17), 1–5 (2013)
Kumar, C.J., Kalita, S.K.: Point feature based recognition of handwritten Meetei Mayek script. In: Advances in Electronics, Communication and Computing, pp. 431–439. Springer (2018)
Kumar, C.J., Kalita, S.K., Sharma, U.: Recognition of Meetei Mayek characters using hybrid feature generated from distance profile and background directional distribution with support vector machine classifier. In: Communication, Control and Intelligent Systems (CCIS), 2015, pp. 186–189. IEEE (2015)
Kumar, M., Sharma, R., Jindal, M., Jindal, S.R., Singh, H.: Benchmark datasets for offline handwritten Gurmukhi script recognition. In: Workshop on Document Analysis and Recognition, pp. 143–151. Springer (2018)
Laishram, R., Singh, A.U., Singh, N.C., Singh, A.S., James, H.: Simulation and modeling of handwritten Meitei Mayek digits using neural network approach. In: Proceedings of the of the International Conferecne on Advances in Electronics, Electrical and Computer Science Engineering-EEC, pp. 355–358 (2012)
Laishram, R., Singh, P.B., Singh, T.S.D., Anilkumar, S., Singh, A.U.: A neural network based handwritten Meitei Mayek alphabet optical character recognition system. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5. IEEE (2014)
Lawgali, A., Angelova, M., Bouridane, A.: Hacdb: Handwritten Arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259. IEEE (2013)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, G., Gommers, R., Waselewski, F., Wohlfahrt, K., O’Leary, A.: Pywavelets: A python package for wavelet analysis. J. Open Source Softw. 4(36), 1237 (2019)
Li, Z., Teng, N., Jin, M., Lu, H.: Building efficient CNN architecture for offline handwritten Chinese character recognition. Int. J. Doc. Ana. Recognit. (IJDAR) 21(4), 233–240 (2018)
Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: Casia online and offline Chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition, pp. 37–41. IEEE (2011)
Mangang, N.K.: Revival of a closed account, a brief history of Kanglei script and the birth of phoon (zero) in the world of arithmetic and astrology. Imphal, Sanmahi Laining Amasung Punshiron Khupham (SalaiPunshipham), Lamshang (2003)
Manjusha, K., Kumar, M.A., Soman, K.: On developing handwritten character image database for Malayalam language script. Eng. Sci. Technol. Int. J. 22(2), 637–645 (2019)
Maring, K.A., Dhir, R.: Recognition of cheising iyek/eeyek-manipuri digits using support vector machines. Ijcsit 1(2), 1–6 (2014)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks (2018). arXiv preprint arXiv:1804.07612
Matisoff, J.A., Baron, S.P., Lowe, J.B.: Languages and dialects of Tibeto-Burman. Sino-Tibetan Etymological Dictionary and Thesaurus Project, Center for ... (1996)
Melnyk, P., You, Z., Li, K.: A high-performance CNN method for offline handwritten Chinese character recognition and visualization. In: Soft Computing, pp. 1–11 (2018)
Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, S.M.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: Tenth International Workshop on Frontiers in Handwriting Recognition (2006)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Nongmeikapam, K., Kumar, W., Singh, M.P.: Exploring an efficient handwritten manipuri meetei-mayek character recognition using gradient feature extractor and cosine distance based multiclass k-nearest neighbor classifier. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pp. 328–337 (2017)
Nongmeikapam, K., Wahengbam, K., Meetei, O.N., Tuithung, T.: Handwritten Manipuri Meetei-Mayek classification using convolutional neural network. ACM Trans. Asian Low Resour. Lang. Inf. Process. (TALLIP) 18(4), 1–23 (2019)
Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in Indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1 (2012)
Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H., et al.: Ifn/enit-database of handwritten arabic words. In: Proceedings of CIFED, vol. 2, pp. 127–136. Citeseer (2002)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Prabhu, V.U.: Kannada-mnist: A new handwritten digits dataset for the Kannada language (2019). arXiv preprint arXiv:1908.01242
Reed, R., MarksII, R.J.: Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks. MIT Press, Cambridge (1999)
Roy, S., Das, N., Kundu, M., Nasipuri, M.: Handwritten isolated Bangla compound character recognition: a new benchmark using a novel deep learning approach. Pattern Recognit. Lett. 90, 15–21 (2017)
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: A new large Urdu database for off-line handwriting recognition. In: International Conference on Image Analysis and Processing, pp. 538–546. Springer (2009)
Sarkhel, R., Das, N., Das, A., Kundu, M., Nasipuri, M.: A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular Indic scripts. Pattern Recognit. 71, 78–93 (2017)
Singh, T., Bawa, S.G., Bansal, P., Vig, R.G., et al.: Off-line handwritten character recognition of Manipuri script. Ph.D. thesis, Thapar Institute of Engineering and Technology (2017)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Su, T., Zhang, T., Guan, D.: Corpus-based hit-MW database for offline recognition of general-purpose Chinese handwritten text. Int. J. Doc. Anal. Recognit. (IJDAR) 10(1), 27 (2007)
Tangkeshwar, T., Bonsai, R.: A novel approach to off-line handwritten character recognition of Manipuri script. In: Soft Computing, p. 365 (2005)
Thokchom, T., Bansal, P., Vig, R., Bawa, S.: Recognition of handwritten character of Manipuri script. JCP 5(10), 1570–1574 (2010)
Williams, T., Li, R.: Advanced image classification using wavelets and convolutional neural networks. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 233–239. IEEE (2016)
Zhang, H., Guo, J., Chen, G., Li, C.: Hcl2000-a large-scale handwritten Chinese character database for handwritten character recognition. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 286–290. IEEE (2009)
Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recognit. 61, 348–360 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hijam, D., Saharia, S. On developing complete character set Meitei Mayek handwritten character database. Vis Comput 38, 525–539 (2022). https://doi.org/10.1007/s00371-020-02032-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-020-02032-y