Skip to main content
Log in

SVM ensembles for named entity disambiguation

  • Published:
Computing Aims and scope Submit manuscript

Abstract

The enormous quantity of digital data necessitates automation, which among other things can help link unstructured to structured data. Such a task requires a systematic approach of mapping entity mentions (e.g., person, location) to corresponding entries in a Knowledge Base. This area of research is rapidly evolving at a breathtaking pace, which has led to the popularization of the Named Entity Disambiguation (NED). NED, also known as Entity Linking, described as the task of removing any ambiguities occurring when processing unstructured data packed with Named Entities. The goal of this paper is to investigate ensemble learning using Support Vector Machines (SVM) for tackling the NED problem. Multiple ensemble learning algorithms were studied, including bagging, boosting and voting using different SVM kernel functions, including Linear, RBF, and Polynomial kernels. Our results on three benchmark corpora show that ensemble learning using SVM produces competitive performance levels compared to well-known entity annotation systems and ensemble models. Specifically, the proposed method was best at the disambiguation of AIDA/CONLL-TestB and AQUAINT with F-measure equals to 78.5 and 71.5%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://archive.org/details/wikipediadumps.

References

  1. Glass M, Gliozzo A (2018) A dataset for web-scale knowledge base population. In: Proceedings of the European semantic web conference (ESWC), Heraklion, Greece. Springer, Cham, pp 256–271. https://doi.org/10.1007/978-3-319-93417-4_17

  2. Zhou G, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL), Philadelphia, PA. ACL, pp 473–480. http://portal.acm.org/citation.cfm?doid=1073083.1073163

  3. Florian R, Ittycheriah A, Jing H, Zhang T (2003) Named entity recognition through classifier combination. In: Proceedings of the 7th conference on natural language learning BUDAPESTACADat HLT-NAACL 2003 (CoNLL), vol 4, Edmonton, Canada. ACL, pp 168–171. http://portal.acm.org/citation.cfm?doid=1119176.1119201

  4. Bunescu R, Paca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the European chapter of the association for computational linguistics (EACL), Trento, Italy. ACL, pp 9–16. http://www.cs.utexas.edu/~ml/papers/encyc-eacl-06.pdf

  5. Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP–CoNLL), Prague, Czech Republic. ACL, pp 708–716. http://www.aclweb.org/anthology/D07-1074

  6. Hoffart J, Yosef MA, Bordino I, Urstenau H, Pinkal M, Spaniol M et al. (2011) Robust disambiguation of named entities in text. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 782–792. http://www.aclweb.org/anthology/D11-1072

  7. West R, Gabrilovich E, Murphy K, Sun S, Gupta R, Lin D (2014) Knowledge base completion via search-based question answering. In: Proceedings of the 23rd international conference on world wide web (WWW). Seoul, Korea, ACM, pp 515–526. http://dl.acm.org/citation.cfm?doid=2566486.2568032

  8. Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150

    Article  MathSciNet  Google Scholar 

  9. Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27(2):443–460

    Article  Google Scholar 

  10. Rizzo A, Erp V, Basave C, Elizabeth A, Rizzo G, Pereira B et al (2017) Lessons learnt from the named entity recognition and linking (NEEL) challenge series. Semant Web J 8(5):667–770

    Article  Google Scholar 

  11. Milne D, Witten IH (2008) Learning to link with Wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM), Napa Valle, CA. ACM, pp 509–518. https://www.cs.waikato.ac.nz/~ihw/papers/08-DNM-IHW-LearningToLinkWithWikipedia.pdf

  12. Milne D, Witten IH (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI workshop on Wikipedia and artificial intelligence: an evolving synergy (AAAI), Chicago, IL. AAAI, pp 25–30. http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf

  13. Ferragina P, Scaiella U (2010) TAGME. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), Toronto, Canada. ACM, pp 1625–1628. http://portal.acm.org/citation.cfm?doid=1871437.1871689

  14. Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1(HT), Portland, OR. ACL, pp 1375–1384. https://dl.acm.org/citation.cfm?id=2002642

  15. Pilz A, Paaß G (2011) From names to entities using thematic context distance. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM), Glasgow, UK. ACM, pp 857–866. http://dl.acm.org/citation.cfm?doid=2063576.2063700

  16. Shen W, Wang J, Luo P, Wang M (2012) LINDEN: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st international conference on world wide web (WWW), Lyon, France. ACM, pp 449–458. https://www2012.universite-lyon.fr/proceedings/proceedings/p449.pdf

  17. Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: Proceedings of the 51st annual meeting of the association for computational linguistics (ACL), Sofia, Bulgaria. ACL, pp 1304–1311. http://www.aclweb.org/anthology/P13-1128

  18. He Z, Liu S, Li M, Zhou M, Zhang L, Wang H (2013) Learning entity representation for entity disambiguation. In: Proceedings of the 51st annual meeting of the association for computational linguistics (ACL), Sofia, Bulgaria, ACL, pp 30–34. http://www.aclweb.org/anthology/P13-2006

  19. Lazic N, Subramanya A, Ringgaard M, Pereira F (2015) Plato: a selective context model for entity resolution. Trans Assoc Comput Linguist 3:503–515

    Article  Google Scholar 

  20. Chisholm A, Hachey B (2015) Entity disambiguation with web links. Trans Assoc Comput Linguist 3(1):145–156

    Article  Google Scholar 

  21. Ganea OE, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th International conference on world wide web (WWW), Montréal, Canada, IW3C2, pp 927–938. https://dl.acm.org/citation.cfm?id=2882988

  22. Phan MC, Sun A, Tay Y, Han J, Li C (2017) NeuPL: attention—based semantic matching and pair—linking for entity disambiguation. In: Proceedings of the 2017 ACM international conference on information and knowledge management (CIKM). Singapore, Singapore. ACM, pp 1667–1676. https://dl.acm.org/citation.cfm?id=3132963

  23. Eshel Y, Cohen N, Radinsky K, Markovitch S, Yamada I, Levy O (2017) Named entity disambiguation for noisy text. In: Proceedings of the 21st conference on computational natural language learning (CoNLL). Vancouver, Canada. ACL, pp 58–68. http://www.aclweb.org/anthology/K17-1008

  24. Barrena A , Soroa A , Agirre E (2018) Learning text representations for 500K classification tasks on named entity disambiguation. In: Proceedings of the 22nd conference on computational natural language learning (CoNLL), Brussels, Belgium. ACM, pp 171–180. http://portal.acm.org/citation.cfm?doid=775047.775067

  25. Hu S, Tan Z, Zeng W, Ge B, Xiao W (2019) Entity linking via symmetrical attention-based neural network and entity structural features. Symmetry 11(4):453

    Article  Google Scholar 

  26. Liu C, Li F, Sun X, Han H (2019) Attention-based joint entity linking with entity embedding. Information 10(2):46

    Article  Google Scholar 

  27. Wang C , He X , Zhou A (2019) HEEL: exploratory entity linking for heterogeneous information networks. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01354-1

  28. Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Edmonton, Canada. ACM, pp 133–142. http://portal.acm.org/citation.cfm?doid=775047.775067

  29. Zhang W, Chuan Y, Jian S, Chew S, Tan L (2010) NUS-I2R: learning a combined system for entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST. https://tac.nist.gov/publications/2010/participant.papers/NUSchime.proceedings.pdf

  30. Varma V, Reddy VB, Kovelamudi S, Bysani P, Santosh G, Kumar K et al (2009) IIIT hyderabad at TAC 2009 update summarization track. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST, pp 7–11. https://pdfs.semanticscholar.org/d602/cc05e91c22bf2916dc97ed7b0ef2d7215989.pdf

  31. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Burlington

    Google Scholar 

  32. Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, MA, pp 115–132. http://svms.org/tutorials/Smolaetal2000.pdf

  33. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  34. Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR), Beijing, China. ACM. pp 765–774. http://nlpr-web.ia.ac.cn/cip/ZhaoJunPublications/paper/SIGIR2011.NED.pdf

  35. Usbeck R, Ngonga Ngomo AC, Röder M, Gerber D, Coelho SA, Auer S et al (2014) AGDISTIS-graph-based disambiguation of named entities using linked data. In: 13th international semantic web conference (ISWC), Riva del Garda, Italy. Springer, Cham, pp 457–471. https://link.springer.com/chapter/10.1007/978-3-319-11964-9_29

  36. Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Paris, France. ACM, pp 457–466. https://www.cc.gatech.edu/~zha/CSE8801/query-annotation/p457-kulkarni.pdf

  37. Phan MC, Sun A, Tay Y, Han J, Li C (2018) Pair-linking for collective entity disambiguation: two could be better than all. In: Computing research repository (CoRR). http://arxiv.org/abs/1802.01074

  38. Ganea OE, Hofmann T (2017) Deep joint entity disambiguation with local neural attention. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), Copenhagen, Denmark. ACL, pp 2619–2629. http://arxiv.org/abs/1704.04920

  39. Lin Y, Lin CY, Ji H (2017) List-only entity linking. In: Proceedings of the 55th annual meeting of the association for computational linguistics (ACL), Vancouver, BC. ACL, pp 536–541. https://doi.org/10.18653/v1/P17-2085

  40. Cucerzan S (2011) TAC entity linking by performing full-document entity extraction and disambiguation. In: Text analysis conference 2011 workshop (TAC). NIST. https://tac.nist.gov/publications/2011/presentations/MS_MLI.presentation.pdf

  41. Mendes PN, Jakob M, García-Silva A, Bizer C (2011) DBpedia spotlight. In: Proceedings of the 7th international conference on semantic systems (I-semantics), Graz, Austria. ACM, pp 1–8. http://dl.acm.org/citation.cfm?doid=2063518.2063519

  42. Nemeskey D, Recski G, Zséder A, Kornai A (2010) BUDAPESTACAD at TAC. In: Proceedings of the text analysis conference 2010 workshop (TAC). Gaithersburg, MD. NIST. https://hlt.bme.hu/en/publ/Nemeskey_2010

  43. Gottipati S, Jiang J (2011) Linking entities to a knowledge base with query expansion. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 804–813. https://dl.acm.org/citation.cfm?id=2145523

  44. Liu Y, An A, Huang X (2003) Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining, Singapore, Singapore, pp 107–118. https://doi.org/10.1007/11731139_15

  45. Singla R, Chambayil B, Khosla A, Santosh J (2011) Comparison of SVM and ANN for classification of eye events in EEG. J Biomed Sci Eng 4(1):62

    Article  Google Scholar 

  46. Nitze I, Schulthess U, Asche H (2012) Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. In: Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil 79, p 3540. https://www.researchgate.net/publication/258667149_Comparison_of_support_vector_machine_neural_network_and_CART_algorithms_for_the_land-cover_classification_using_limited_training_data_points

  47. Noi T, Kappas M (2018) Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery. Sensors 18(1):18

    Google Scholar 

  48. Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on world wide web (WWW), Rio de Janeiro, Brazil. ACM, pp 249–260. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40749.pdf

  49. Barrena A, Soroa A, Agirre E (2016) Alleviating poor context with background knowledge for named entity disambiguation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL), Berlin, Germany. ACL, pp 1903–1912. http://www.aclweb.org/anthology/P16-1179

  50. Barrena A, Soroa A, Agirre E (2015) Combining mention context and hyperlinks from wikipedia for named entity disambiguation. In: Proceedings of the fourth joint conference on lexical and computational semantics (*SEMEVAL), Denver, CO. ACL, pp 101–105. http://www.aclweb.org/anthology/S15-1011

  51. Han X, Zhao J (2009) NLPR_KBP in TAC 2009 KBP track: a two-stage method to entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.232.2434

  52. Guo S, Chang MW, Kiciman E (2013) To link or not to link? A study on end-to-end tweet entity linking. In: Proceedings for the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT), Atlanta, GA. ACL, pp 1020–1030. http://infolab.stanford.edu/~sdguo/naacl2013.pdf

  53. Dredze M, Mcnamee P, Rao D, Gerber A, Finin T (2010) Entity disambiguation for knowledge base population. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, China. ACL, pp 277–285. https://www.cs.jhu.edu/~mdredze/publications/entity_linking_coling.pdf

  54. Zhang W, Su J, Tan CL, Wang WT (2010) Entity linking leveraging: automatically generated annotation. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, China. ACL, pp 1290–1298. https://www.aclweb.org/anthology/C/C10/C10-1145.pdf

  55. Zheng Z, Li F, Huang M, Zhu X (2010) Learning to link entities with knowledge base. In: Proceedings of the 23rd international conference on computational linguistics (HLT), Los Angeles, CA. ACL, pp 483–491. https://dl.acm.org/citation.cfm?id=1858071

  56. Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y et al (2011) I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST http://yanchuan.sg/assets/papers/zhang2011nus.pdf

  57. Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in Tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Chicago, IL. ACM, pp 68–76. http://dl.acm.org/citation.cfm?doid=2487575.2487686

  58. Chen Z, Ji H (2011) Collaborative ranking: a case study on entity linking. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 771–781. https://aclanthology.info/pdf/D/D11/D11-1071.pdf

  59. Manning CD, Raghavan P, Schutze H (2009) Evaluation in information retrieval. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  60. Hsu CW, Chang CC, Lin CJ (2004) A practical guide to support vector classification. Department of Computer Science and Information Engineering, National Taiwan University, Taipei City

    Google Scholar 

  61. Polikar R (2012) Ensemble machine learning. Springer, Boston. https://doi.org/10.1007/978-1-4419-9326-7_1

    Book  Google Scholar 

  62. Mitchell T (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  63. Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (SIGIR), Pisa, Italy. ACM, pp 425–434. http://dl.acm.org/citation.cfm?doid=2911451.2911535

  64. Hachey B, Nothman J, Radford W (2014) Cheap and easy entity evaluation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (ACL), Baltimore, MD. ACL, pp 464–469. http://acl2014.org/acl2014/P14-2/pdf/P14-2076.pdf

  65. Tjong EF, Sang K, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL 2003-volume 4 (CoNLL), Edmonton, Canada. ACL, pp 142–147. https://dl.acm.org/citation.cfm?id=1119195

  66. Chang YW, Hsieh CJ, Chang KW, Ringgaard M, Lin CJ (2010) Training and testing low-degree polynomial data mappings via linear SVM. J Mach Learn Res 11:1471–1490

    MathSciNet  MATH  Google Scholar 

  67. Usbeck R, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R et al (2015) GERBIL. In: Proceedings of the 24th international conference on world wide web (WWW), Florence, Italy, IW3C2, pp. 1133–1143. http://dl.acm.org/citation.cfm?doid=2736277.2741626

Download references

Acknowledgements

This work was supported by the Research Center of the College of Computer and Information Sciences, King Saud University. The authors are grateful for this support and to the anonymous reviewers for their insightful feedbacks.

Funding

This research was supported by a special fund in the Research Centre of the College of Computer and Information Sciences at King Saud University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amal Alokaili.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alokaili, A., Menai, M.E.B. SVM ensembles for named entity disambiguation. Computing 102, 1051–1076 (2020). https://doi.org/10.1007/s00607-019-00748-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00748-x

Keywords

Mathematics Subject Classification

Navigation