Parts-of-Speech tagging for Malayalam using deep learning techniques

Akhil, K. K.; Rajimol, R.; Anoop, V. S.

doi:10.1007/s41870-020-00491-z

Parts-of-Speech tagging for Malayalam using deep learning techniques

Original Research
Published: 16 June 2020

Volume 12, pages 741–748, (2020)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

361 Accesses
11 Citations
Explore all metrics

Abstract

Parts-of-speech tagging is a process in linguistics which deals with tagging each word in a sentence with their corresponding parts-of-speech. This process is considered to be one of the pre-processing steps for many natural language processing tasks. Earlier approaches were based on simple heuristics and later several methods were reported in the literature that incorporated machine learning techniques such as artificial neural networks. Very recently, with the advancement of deep learning-based approaches, parts-of-speech tagging process became more accurate and a reasonable number of taggers are now available for high resource languages such as English. But the low resource languages such as Malayalam is still lacking computationally efficient and accurate methods and techniques for parts-of-speech tagging. In this direction, this work proposes a deep learning-based approach for parts-of-speech tagging for the Malayalam language. Experiments conducted on real datasets show that the proposed method outperforms some of the already available methods in terms of precision and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Diksha Khurana, Aditya Koli, … Sukhdev Singh

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

Marco Cascella, Federico Semeraro, … Elena Bignami

How to Fine-Tune BERT for Text Classification?

References

Kumar S, Kumar MA, Soman KP (2019) Deep learning based part-of-speech tagging for malayalam twitter data (special issue: deep learning techniques for natural language processing). J Intell Syst 28(3):423–435
Article Google Scholar
Sarkar K, Gayen V (2013) A trigram HMM-based POS tagger for Indian languages. In: Proceedings of the international conference on frontiers of intelligent computing: theory and applications (FICTA) (pp 205–212). Springer, Berlin, Heidelberg
Sarkar K (2016) A CRF based POS tagger for code-mixed Indian social media text. arXiv preprint arXiv: 1612.07956
Qin L (2019) POS tagging of chinese buddhist texts using recurrent neural networks, report, Department of East Asian Languages and Cultures, Stanford University
Plank B, Sgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv: 1604.05529
Ling W, Lus T, Marujo L, Astudillo RF, Amir S, Dyer C, Trancoso I (2015) Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv: 1508.02096
Santos CD, Zadrozny B (2014) Learning character-level representations for part-of- speech tagging. In: Proceedings of the 31st international conference on machine learning (ICML-14) (pp 1818–1826)
Chrupaa G (2013) Text segmentation with character-level text embeddings. arXiv preprint arXiv:1309.4628
Gillick D, Brunk C, Vinyals O, Subramanya A (2015) Multilingual language processing from bytes. arXiv preprint arXiv:1512.00103
Gimpel K, Schneider N, O’Connor B, Das D, Mills D, Eisenstein J, Smith NA (2010) Part-of-speech tagging for twitter: annotation, features, and experiments. In: Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science
Nooralahzadeh F, Brun C, Roux C (2014) Parts of speech tagging for french social media data. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp 1764–1772)
Owoputi O, O’Connor, B, Dyer C, Gimpel K, Schneider N, Smith NA (2013) Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (pp 380–390)
Vyas Y, Gella S, Sharma J, Bali, K, Choudhury M (2014) Pos tagging of english-hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 974–979)
Jamatia A, Das A (2014) Part-of-speech tagging system for indian social media text on twitter. In: Social-India 2014, first workshop on language technologies for indian social media text, at the eleventh international conference on natural language processing (ICON-2014) (pp 21–28)
Jamatia A, Gambck B, Das A (2015) Part-of-speech tagging for code-mixed english-hindi twitter and face-book chat messages. In: Proceedings of the international conference recent advances in natural language processing (pp 239–248)
Baskaran S, Bali K, Bhattacharya T, Bhattacharyya P, Jha GN, Rajendran S, Sobha L (2008) Designing a common POS-tagset framework for Indian languages. In: Proceedings of the 6th workshop on Asian language resources
Petrov S, Das D, McDonald R (2011) A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086
Patel RN, Pimpale PB, Sasikumar M (2016) Recurrent neural network based part-of-speech tagger for code-mixed social media text. arXiv preprint arXiv:1611.04989
Pimpale PB, Patel RN (2016) Experiments with POS tagging code-mixed Indian social media text. arXiv preprint arXiv:1610.09799
Krishnan KG, Pooja A, Kumar MA, Soman KP (2017) Character based bidirectional LSTM for disambiguating tamil part-of-speech categories. Int J Control Theory Appl 2017:229–235
Google Scholar
Jamatia A, Das A (2016) Task report: tool contest on POS tagging for code- mixed indian social Media (Facebook, Twitter, and Whatsapp) Text@ ICON 2016 the proceeding of ICON 2016
Ghosh S, Das D (2016) Part-of-speech tagging of code-mixed social media text. In: Proceedings of the second workshop on computational approaches to code switching (pp 90–97)
Joshi N, Darbari H, Mathur I (2013) HMM based POS tagger for Hindi. In: Proceeding of 2013 international conference on artificial intelligence, soft computing (AISC-2013)
Bharati A, Sangal R, Sharma DM, Bai L (2006) Anncorra: annotating corpora guidelines for pos and chunk annotation for indian languages. LTRC-TR31, pp 1–38
Reddy S, Sharoff S (2011) Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the fifth international workshop on cross lingual information access (pp 11–19)
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of- speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North american chapter of the association for computational linguistics on human language technology-volume 1 (pp 173–180). Association for Computational Linguistics
Manju K, Soumya S, Idicula SM (2009) Development of a POS tagger for Malayalam-an experience. In: 2009 international conference on advances in recent technologies in communication and computing, IEEE (pp 709–713)
Kumawat D, Jain V (2015) Pos tagging approaches: a comparison. Int J Comput Appl 118:6
Google Scholar
Hasan FM (2006) Comparison of different POS tagging techniques for some South Asian languages (Doctoral dissertation, BRAC University)
Rajeev RR, Jayan JP, Sherly E (2010) Tagging Malayalam text with Parts of Speech-TnT and SVM tagger comparison
Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. Adv Neural Inf Process Syst 2016:10191027
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Nambiar SK, Leons A, Jose S (2019) POS tagger for Malayalam using Hidden Markov Model. In: 2019 international conference on smart systems and inventive technology (ICSSIT), IEEE (pp 957–960)

Download references

Acknowledgements

The authors would like to thank all the researchers and staff members of Data Engineering Lab for their constructive comments and feedback that significantly improved the quality of this paper. The authors also acknowledge the people who provided the tagged dataset for Malayalam that is used in this work.

Author information

Authors and Affiliations

Indian Institute of Information Technology and Management-Kerala (IIITM-K) Technopark Campus, Thiruvananthapuram, Kerala, 695581, India
K. K. Akhil, R. Rajimol & V. S. Anoop

Authors

K. K. Akhil
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajimol
View author publications
You can also search for this author in PubMed Google Scholar
V. S. Anoop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. S. Anoop.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akhil, K.K., Rajimol, R. & Anoop, V.S. Parts-of-Speech tagging for Malayalam using deep learning techniques. Int. j. inf. tecnol. 12, 741–748 (2020). https://doi.org/10.1007/s41870-020-00491-z

Download citation

Received: 04 January 2020
Accepted: 01 June 2020
Published: 16 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s41870-020-00491-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parts-of-Speech tagging for Malayalam using deep learning techniques

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parts-of-Speech tagging for Malayalam using deep learning techniques

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation