Skip to main content
Log in

Ligature categorization based Nastaliq Urdu recognition using deep neural networks

  • S.I.: CMKBO
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

The cursive nature, Nastaliq writing style and a large number of different ligatures make ligature recognition very difficult in Urdu. In this paper, we present a segmentation-free approach to holistically recognize Urdu ligatures. We first generate a rich dataset which contains 17,010 ligatures with different orientation and different degrees of noise. Secondly, the ligatures are clustered (categorized) in order to reduce the search space and make the learning robust. Finally, we employ a deep neural network with dropout regularization to classify ligatures. The detailed experiments show that a deep neural network with dropout regularization and clustering of ligatures significantly enhances the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. www.cle.org.pk/software/ling_resources/UrduHighFreqLigature.htm.

References

  • Ahmad I, Wang X, Mao YH, Liu G, Ahmad H, Ullah R (2017a) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 17:1–12. https://doi.org/10.1007/s10586-017-0990-5

    Google Scholar 

  • Ahmad I, Wang X, Li R, Rasheed S (2017b) Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Commun 14(1):146–157

    Article  Google Scholar 

  • Asad M, Butt AS, Chaudhry S, Hussain S (2004) Rule-based expert system for urdu Nastaleeq justification. In: Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International, IEEE. pp. 591–596

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, CVPR 2005, IEEE vol. 1, pp. 886–893

  • Dalb SKS et al (2015) Review of online and offline character recognition. Int J Eng Comput Sci 4(5):11729–11732

    Google Scholar 

  • Din IU, Siddiqi I (2017) Khalid S (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 1:62

    Article  Google Scholar 

  • El-Korashy A, Shafait F (2013) Search space reduction for holistic ligature recognition in Urdu Nastalique script. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1125–1129

  • Gonzalez RC, Woods RE (2004) Eddins SL (2004) Digital image processing using MATLAB. Cambridge, p, Pearson Education

    Google Scholar 

  • Hussain S, Niazi A, Anjum U, Irfan E, et al (2014) Adapting Tesseract for complex scripts: an example for Urdu Nastalique. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 191–195

  • Impedovo S, Ottaviano L, Occhinegro S (1991) Optical character recognition–a survey. Int J Pattern Recogn Artif Intell 5(01n02):1–24

    Article  Google Scholar 

  • Javed ST, Hussain S (2009) Improving Nastalique specific pre-recognition process for Urdu OCR. In: Multitopic Conference, 2009. INMIC 2009. IEEE 13th International pp. 1–6. IEEE

  • Javed ST (2007) Investigation into a segmentation based OCR for the Nastaleeq writing system. National University of Computer and Emerging Sciences, Islamabad, p 2007

    Google Scholar 

  • Javed ST, Hussain S (2013) Segmentation based urdu nastalique OCR. Iberoamerican Congress on Pattern Recognition. Springer, Berlin, pp 41–49

    Google Scholar 

  • Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2013) Segmentation free nastalique Urdu OCR. World Acad Sci Eng Technol 46:456–461

    Google Scholar 

  • Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures-a holistic approach. In : 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 71–75

  • Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures—a holistic approach. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 71–75, Aug 2015

  • Khorsheed MS (2015) Recognizing cursive typewritten text using segmentation-free system. Sci World J 2015:7. https://doi.org/10.1155/2015/818432

    Article  Google Scholar 

  • Lehal GS, Rana A (2013) Recognition of nastalique urdu ligatures. In: Proceedings of the 4th International Workshop on Multilingual OCR, ACM, p. 7

  • Lehal GS (2013) Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1130–1134

  • Line Eikvil (1993) Optical character recognition citeseer.ist.psu.edu/142042.html

  • Marques O (2011) Practical image and video processing using MATLAB. Wiley, New Jersey

    Book  Google Scholar 

  • Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058

    Article  Google Scholar 

  • Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248

    Article  Google Scholar 

  • Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of Arabic-like scripts: a comprehensive survey. Educ Inf Technol 21(5):1225–1241

    Article  Google Scholar 

  • Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243:80–87

    Article  Google Scholar 

  • Rana A, Lehal GS (2015) Offline Urdu OCR using ligature based segmentation for Nastaliq Script. Indian J Sci Technol 8(35):1–9

    Article  Google Scholar 

  • Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu Nastaliq OCR. In: Proceedings of the Conference on Language & Technology, pp. 85–91

  • Shafait F, Sabbour N (2013) A segmentation-free approach to Arabic and Urdu OCR. Proc SPIE 8658:8658

    Google Scholar 

  • Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    Google Scholar 

  • Su T-H, Zhang T-W, Guan D-J, Huang H-J (2009) Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recogn 42(1):167–182

    Article  Google Scholar 

  • Venkata Rao N, Sastry ASCS, Chakravarthy ASN, Kalyanchakravarthi P, Kalyanchakravarthi P (2016) Optical character recognition technique algorithms. J Theor Appl Inf Technol 83(2):275

    Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Pattern Anal Mach Intell 13(8):841–847

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zia ur Rehman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rafeeq, M.J., ur Rehman, Z., Khan, A. et al. Ligature categorization based Nastaliq Urdu recognition using deep neural networks. Comput Math Organ Theory 25, 184–195 (2019). https://doi.org/10.1007/s10588-018-9271-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-018-9271-y

Keywords

Navigation