Skip to main content
Log in

Named entity recognition: a semi-supervised learning approach

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) application development for under-resourced (i.e. NLP resource) language is usually obstructed by lack of named entity tagged dataset and this led to performance deterioration. Similarly, in Amharic language getting annotated training dataset for named entity recognition problem is extortionate, though an enormous amount of untagged data is easily accessible. Fortunately, the performance of NER possibly be boosted via encompassing a few labeled data with an oversized collection of unlabeled data. Based on this premise, this paper tend to investigate graph-based label propagation algorithm for the Amharic NER problem, a simple semi-supervised, iterative algorithm, to propagate labels through the dataset. In addition, it is aimed at making a rigorous comparison with expectation–maximization with semi-supervised learning approaches. The experiment reveals, label propagation based NER achieves superior performance compared to expected maximization using a few labeled training data. Since expectation maximization algorithm demands a moderate labeled example to be learned, meant very few labeled examples are not enough to generate adequate parameters for recognition of named entities, consequently it couldn’t perform great as the label propagation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. GuoDong Z, Jian S (2002) Named entity recognition using an HMM-based Chunk Tagger. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 473–480. https://doi.org/10.3115/1073083.1073163

  2. Banko M, Cafarella J, Soderland S, Broadhead M, Etzion O (2007) Open information extraction from the web. In: Proceedings of the 20th international joint conference on Artificial intelligence, pp 2670–2676

  3. Besufkad A (2013) A named entity recognition system for Amharic, Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/2741. Accessed May 2019

  4. Mikiyas T (2017) Amharic named entity recognition using a hybrid approach, Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/52. Accessed May 2019

  5. Moges A (2010) Named entity recognition for Amharic language, MSc Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/58745. Accessed May 2019

  6. Zaghloul W, Trimi S (2017) Developing an innovative entity extraction method for unstructured data. Int J Qual Innov 3(3):217–226. https://doi.org/10.1186/s40887-0170012-y

    Article  Google Scholar 

  7. Xiaoshan F, Huanye G, Jianfeng G (2004) A semi supervised approach to build annotated corpus for Chinese named entity recognition. In: Proceedings of the third (SIGHAN) workshop on Chinese language processing, pp 129–133

  8. Xiaojin Z, Ghahramaniy Z (2003) Learning from labeled and unlabeled data with label propagation, CMU CALD tech report CMU-CALD-02-107

  9. Xiaojin Z, Ghahramani Z, Lafferty D (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the twentieth international conference on machine learning, pp 912–919

  10. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  11. Zoidi O, Fotiadou E, Nikolaidis N, Pitas I (2015) Graph-based label propagation in digital media: a review. ACM Comput Surv 47(3):510–545. https://doi.org/10.1145/2700381

    Article  Google Scholar 

  12. Jinxiu C, Donghong J, Chew Tan L, Zhengyu N (2006) Relation extraction using label propagation based semi-supervised learning. In: Proceedings of the international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, pp 129–136

  13. Yi Y, Chen Y, Jiangyan D, Xiaolin G, Chunlei C, Gang L, Wenle W (2018) Semi-supervised ridge regression with adaptive graph-based label propagation. Journal of Applied Sciences 8(12):174–196

    Google Scholar 

  14. Widmann N, Verbern S (2017) Graph-based semi-supervised learning for text classification. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval, pp 59–66

  15. Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning, vol 508. The MIT Press, Cambridge

    Book  Google Scholar 

  16. Nigam K, Mccallum A, Thrun KS (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2):103–134

    Article  Google Scholar 

  17. Fabian P, Gael V, Gramfort A, Michel V, Thirion B (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  18. Lin J (2015) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–150

    Article  MathSciNet  Google Scholar 

  19. Gasser M (2012) HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya. In: Conference on human language technology for development, pp 94–99

  20. Zhang Z (2004) Weakly-supervised relation classification for information extraction. In: Proceedings of ACM 13th conference on information and knowledge management, pp 8–13

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Sintayehu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sintayehu, H., Lehal, G.S. Named entity recognition: a semi-supervised learning approach. Int. j. inf. tecnol. 13, 1659–1665 (2021). https://doi.org/10.1007/s41870-020-00470-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00470-4

Keywords

Navigation