Skip to main content
Log in

Parts-of-Speech tagging for Malayalam using deep learning techniques

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Parts-of-speech tagging is a process in linguistics which deals with tagging each word in a sentence with their corresponding parts-of-speech. This process is considered to be one of the pre-processing steps for many natural language processing tasks. Earlier approaches were based on simple heuristics and later several methods were reported in the literature that incorporated machine learning techniques such as artificial neural networks. Very recently, with the advancement of deep learning-based approaches, parts-of-speech tagging process became more accurate and a reasonable number of taggers are now available for high resource languages such as English. But the low resource languages such as Malayalam is still lacking computationally efficient and accurate methods and techniques for parts-of-speech tagging. In this direction, this work proposes a deep learning-based approach for parts-of-speech tagging for the Malayalam language. Experiments conducted on real datasets show that the proposed method outperforms some of the already available methods in terms of precision and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Kumar S, Kumar MA, Soman KP (2019) Deep learning based part-of-speech tagging for malayalam twitter data (special issue: deep learning techniques for natural language processing). J Intell Syst 28(3):423–435

    Article  Google Scholar 

  2. Sarkar K, Gayen V (2013) A trigram HMM-based POS tagger for Indian languages. In: Proceedings of the international conference on frontiers of intelligent computing: theory and applications (FICTA) (pp 205–212). Springer, Berlin, Heidelberg

  3. Sarkar K (2016) A CRF based POS tagger for code-mixed Indian social media text. arXiv preprint arXiv: 1612.07956

  4. Qin L (2019) POS tagging of chinese buddhist texts using recurrent neural networks, report, Department of East Asian Languages and Cultures, Stanford University

  5. Plank B, Sgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv: 1604.05529

  6. Ling W, Lus T, Marujo L, Astudillo RF, Amir S, Dyer C, Trancoso I (2015) Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv: 1508.02096

  7. Santos CD, Zadrozny B (2014) Learning character-level representations for part-of- speech tagging. In: Proceedings of the 31st international conference on machine learning (ICML-14) (pp 1818–1826)

  8. Chrupaa G (2013) Text segmentation with character-level text embeddings. arXiv preprint arXiv:1309.4628

  9. Gillick D, Brunk C, Vinyals O, Subramanya A (2015) Multilingual language processing from bytes. arXiv preprint arXiv:1512.00103

  10. Gimpel K, Schneider N, O’Connor B, Das D, Mills D, Eisenstein J, Smith NA (2010) Part-of-speech tagging for twitter: annotation, features, and experiments. In: Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science

  11. Nooralahzadeh F, Brun C, Roux C (2014) Parts of speech tagging for french social media data. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp 1764–1772)

  12. Owoputi O, O’Connor, B, Dyer C, Gimpel K, Schneider N, Smith NA (2013) Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (pp 380–390)

  13. Vyas Y, Gella S, Sharma J, Bali, K, Choudhury M (2014) Pos tagging of english-hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 974–979)

  14. Jamatia A, Das A (2014) Part-of-speech tagging system for indian social media text on twitter. In: Social-India 2014, first workshop on language technologies for indian social media text, at the eleventh international conference on natural language processing (ICON-2014) (pp 21–28)

  15. Jamatia A, Gambck B, Das A (2015) Part-of-speech tagging for code-mixed english-hindi twitter and face-book chat messages. In: Proceedings of the international conference recent advances in natural language processing (pp 239–248)

  16. Baskaran S, Bali K, Bhattacharya T, Bhattacharyya P, Jha GN, Rajendran S, Sobha L (2008) Designing a common POS-tagset framework for Indian languages. In: Proceedings of the 6th workshop on Asian language resources

  17. Petrov S, Das D, McDonald R (2011) A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086

  18. Patel RN, Pimpale PB, Sasikumar M (2016) Recurrent neural network based part-of-speech tagger for code-mixed social media text. arXiv preprint arXiv:1611.04989

  19. Pimpale PB, Patel RN (2016) Experiments with POS tagging code-mixed Indian social media text. arXiv preprint arXiv:1610.09799

  20. Krishnan KG, Pooja A, Kumar MA, Soman KP (2017) Character based bidirectional LSTM for disambiguating tamil part-of-speech categories. Int J Control Theory Appl 2017:229–235

    Google Scholar 

  21. Jamatia A, Das A (2016) Task report: tool contest on POS tagging for code- mixed indian social Media (Facebook, Twitter, and Whatsapp) Text@ ICON 2016 the proceeding of ICON 2016

  22. Ghosh S, Das D (2016) Part-of-speech tagging of code-mixed social media text. In: Proceedings of the second workshop on computational approaches to code switching (pp 90–97)

  23. Joshi N, Darbari H, Mathur I (2013) HMM based POS tagger for Hindi. In: Proceeding of 2013 international conference on artificial intelligence, soft computing (AISC-2013)

  24. Bharati A, Sangal R, Sharma DM, Bai L (2006) Anncorra: annotating corpora guidelines for pos and chunk annotation for indian languages. LTRC-TR31, pp 1–38

  25. Reddy S, Sharoff S (2011) Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the fifth international workshop on cross lingual information access (pp 11–19)

  26. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of- speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North american chapter of the association for computational linguistics on human language technology-volume 1 (pp 173–180). Association for Computational Linguistics

  27. Manju K, Soumya S, Idicula SM (2009) Development of a POS tagger for Malayalam-an experience. In: 2009 international conference on advances in recent technologies in communication and computing, IEEE (pp 709–713)

  28. Kumawat D, Jain V (2015) Pos tagging approaches: a comparison. Int J Comput Appl 118:6

    Google Scholar 

  29. Hasan FM (2006) Comparison of different POS tagging techniques for some South Asian languages (Doctoral dissertation, BRAC University)

  30. Rajeev RR, Jayan JP, Sherly E (2010) Tagging Malayalam text with Parts of Speech-TnT and SVM tagger comparison

  31. Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. Adv Neural Inf Process Syst 2016:10191027

    Google Scholar 

  32. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  33. Nambiar SK, Leons A, Jose S (2019) POS tagger for Malayalam using Hidden Markov Model. In: 2019 international conference on smart systems and inventive technology (ICSSIT), IEEE (pp 957–960)

Download references

Acknowledgements

The authors would like to thank all the researchers and staff members of Data Engineering Lab for their constructive comments and feedback that significantly improved the quality of this paper. The authors also acknowledge the people who provided the tagged dataset for Malayalam that is used in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. S. Anoop.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akhil, K.K., Rajimol, R. & Anoop, V.S. Parts-of-Speech tagging for Malayalam using deep learning techniques. Int. j. inf. tecnol. 12, 741–748 (2020). https://doi.org/10.1007/s41870-020-00491-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00491-z

Keywords

Navigation