Abstract
Named entity recognition (NER) application development for under-resourced (i.e. NLP resource) language is usually obstructed by lack of named entity tagged dataset and this led to performance deterioration. Similarly, in Amharic language getting annotated training dataset for named entity recognition problem is extortionate, though an enormous amount of untagged data is easily accessible. Fortunately, the performance of NER possibly be boosted via encompassing a few labeled data with an oversized collection of unlabeled data. Based on this premise, this paper tend to investigate graph-based label propagation algorithm for the Amharic NER problem, a simple semi-supervised, iterative algorithm, to propagate labels through the dataset. In addition, it is aimed at making a rigorous comparison with expectation–maximization with semi-supervised learning approaches. The experiment reveals, label propagation based NER achieves superior performance compared to expected maximization using a few labeled training data. Since expectation maximization algorithm demands a moderate labeled example to be learned, meant very few labeled examples are not enough to generate adequate parameters for recognition of named entities, consequently it couldn’t perform great as the label propagation algorithm.
Similar content being viewed by others
References
GuoDong Z, Jian S (2002) Named entity recognition using an HMM-based Chunk Tagger. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 473–480. https://doi.org/10.3115/1073083.1073163
Banko M, Cafarella J, Soderland S, Broadhead M, Etzion O (2007) Open information extraction from the web. In: Proceedings of the 20th international joint conference on Artificial intelligence, pp 2670–2676
Besufkad A (2013) A named entity recognition system for Amharic, Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/2741. Accessed May 2019
Mikiyas T (2017) Amharic named entity recognition using a hybrid approach, Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/52. Accessed May 2019
Moges A (2010) Named entity recognition for Amharic language, MSc Thesis, Addis Ababa University, Addis Ababa. http://etd.aau.edu.et/handle/123456789/58745. Accessed May 2019
Zaghloul W, Trimi S (2017) Developing an innovative entity extraction method for unstructured data. Int J Qual Innov 3(3):217–226. https://doi.org/10.1186/s40887-0170012-y
Xiaoshan F, Huanye G, Jianfeng G (2004) A semi supervised approach to build annotated corpus for Chinese named entity recognition. In: Proceedings of the third (SIGHAN) workshop on Chinese language processing, pp 129–133
Xiaojin Z, Ghahramaniy Z (2003) Learning from labeled and unlabeled data with label propagation, CMU CALD tech report CMU-CALD-02-107
Xiaojin Z, Ghahramani Z, Lafferty D (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the twentieth international conference on machine learning, pp 912–919
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50
Zoidi O, Fotiadou E, Nikolaidis N, Pitas I (2015) Graph-based label propagation in digital media: a review. ACM Comput Surv 47(3):510–545. https://doi.org/10.1145/2700381
Jinxiu C, Donghong J, Chew Tan L, Zhengyu N (2006) Relation extraction using label propagation based semi-supervised learning. In: Proceedings of the international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, pp 129–136
Yi Y, Chen Y, Jiangyan D, Xiaolin G, Chunlei C, Gang L, Wenle W (2018) Semi-supervised ridge regression with adaptive graph-based label propagation. Journal of Applied Sciences 8(12):174–196
Widmann N, Verbern S (2017) Graph-based semi-supervised learning for text classification. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval, pp 59–66
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning, vol 508. The MIT Press, Cambridge
Nigam K, Mccallum A, Thrun KS (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2):103–134
Fabian P, Gael V, Gramfort A, Michel V, Thirion B (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Lin J (2015) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–150
Gasser M (2012) HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya. In: Conference on human language technology for development, pp 94–99
Zhang Z (2004) Weakly-supervised relation classification for information extraction. In: Proceedings of ACM 13th conference on information and knowledge management, pp 8–13
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sintayehu, H., Lehal, G.S. Named entity recognition: a semi-supervised learning approach. Int. j. inf. tecnol. 13, 1659–1665 (2021). https://doi.org/10.1007/s41870-020-00470-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-020-00470-4