Skip to main content
Log in

HEEL: exploratory entity linking for heterogeneous information networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A heterogeneous information network (HIN) is a ubiquitous data model, consisting of multiple types of entities and relations. Names of entities in HINs are inherently ambiguous, making it difficult to fully disambiguate a HIN. In this paper, we introduce the task of exploratory entity linking for HINs. Given a partially disambiguated HIN, we aim at linking ambiguous names to disambiguated entities in the HIN if their referent entities are present. We also try to “explore” other alternatives by discovering new entities and adding them to the HIN. A partial classification EM-based approach is proposed to address this task. We present a constrained probability propagation model to link surface names to entities in the HIN. New entity detection process is modeled as a maximum edge weight clique problem. Experiments illustrate that our method outperforms state-of-the-art methods for entity linking with HINs and author name disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. In meta-path description, we use PATV and Y to represent any nodes (i.e., entities) in the FDS with the type of paper, author, term, venue and year, respectively.

  2. http://dblp.dagstuhl.de/xml/release/.

  3. https://tartarus.org/martin/PorterStemmer/.

  4. To our knowledge, there exist some other EL methods that consider the NIL issue such as [23]. But their task is to link mentions in the plain texts to entities in the knowledge bases and it is not easy to modify them for EL with HINs.

  5. There are no unlinkable records for the remaining two author names.

References

  1. Alidaee B, Glover F, Kochenberger GA, Wang H (2007) Solving the maximum edge weight clique problem via unconstrained quadratic programming. Eur J Oper Res 181(2):592–597

    Article  Google Scholar 

  2. Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: ACL-COLING, pp 79–85

  3. Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: EACL

  4. Carmel D, Chang M-W, Gabrilovich E, Hsu B-JP, Wang K (2014) Erd’14: entity recognition and disambiguation challenge. In: SIGIR Forum vol 48, no 2, pp 63–77

    Article  Google Scholar 

  5. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

    Article  MathSciNet  Google Scholar 

  6. Chiang M-F, Liou J-J, Wang J-L, Peng W-C, Shan M-K (2013) Exploring heterogeneous information networks and random walk with restart for academic search. Knowl Inf Syst 36(1):59–82

    Article  Google Scholar 

  7. Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H (2016) A piggyback system for joint entity mention detection and linking in web queries. In: WWW, pp 567–578

  8. Dalvi BB, Cohen WW, Callan J (2013) Exploratory learning. In: ECML-PKDD, pp 128–143

    Chapter  Google Scholar 

  9. Ferreira AA, Gonçalves MA, Laender AHF (2012) A brief survey of automatic methods for author name disambiguation. In: SIGMOD Record, vol 41, no 2, pp 15–26

    Article  Google Scholar 

  10. Ganea O-E, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: WWW, pp 927–938

  11. Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: SIGIR, pp 765–774

  12. Kanani PH, McCallum A, Chris P (2007) Improving author coreference by resource-bounded information gathering from the web. In: IJCAI, pp 429–434

  13. Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67

    Article  MathSciNet  Google Scholar 

  14. Li C, Cheung WK, Ye Y, Zhang X, Chu D-H, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383

    Article  Google Scholar 

  15. Pei L, Luna DX, Andrea M, Divesh S (2011) Linking temporal records. In: PVLDB, vol 4, no 11, pp 956–967

  16. Li S, Cong G, Miao C (2012) Author name disambiguation using a new categorical distribution similarity. In: ECML-PKDD, pp 569–584

    Chapter  Google Scholar 

  17. Li Y, Tan S, Sun H, Han J, Dan R, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: WWW, pp 1261–1270

  18. Pitts M, Savvana S, Roy SB, Mandava V (2014) ALIAS: author disambiguation in Microsoft academic search engine dataset. In: EDBT, pp 648–651

  19. Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: CIKM, pp 1241–1246

  20. Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: SIGMOD, pp 1199–1210

  21. Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. TKDE 27(2):443–460

    Google Scholar 

  22. Shen W, Wang J, Luo P, Wang M (2012) LIEGE: link entities in web lists with knowledge base. In: KDD, pp 1424–1432

  23. Shen W, Wang J, Luo P, Wang M (2012) LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWW

  24. Shi C, Li Y, Yu PS, Bin W (2016) Constrained-meta-path-based ranking in heterogeneous information network. Knowl Inf Syst 49(2):719–747

    Article  Google Scholar 

  25. Sil A, Florian R (2016) One for all: towards language independent named entity linking. In: ACL, pp 2255–2264

  26. Solecki B, Silva L, Efimov D (2013) KDD cup 2013: author disambiguation. In: KDD Cup 2013 workshop, pp 9:1–9:3

  27. Sun Y, Han J, Yan X, Yu PS, Tianyi W (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: PVLDB, vol 4, no 11, pp 992–1003

  28. Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: EDBT, pp 565–576

  29. Tang J (2016) Aminer: toward understanding big scholar data. In: WSDM, p 467

  30. Wang C, Zhang R, He X, Zhou A (2016) Error link detection and correction in Wikipedia. In: CIKM, pp 307–316

  31. Wang X, Tang J , Cheng H, Yu PS (2011) ADANA: active name disambiguation. In: ICDM, pp 794–803

  32. Yang Y, Chang M-W (2015) S-MART: novel tree-based structured learning algorithms applied to tweet entity linking. In: ACL-IJCNLP, pp 504–513

  33. Yin X, Han J, Yu PS (2007) Object distinction: distinguishing objects with identical names. In: ICDE, pp 1242–1246

  34. Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification. A case study: online name disambiguation using temporal record streams. In: CIKM, pp 1341–1350

  35. Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: SIGIR, pp 425–434

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000904. Chengyu Wang is partially supported by the Outstanding Doctoral Dissertation Cultivation Plan of Action under Grant No. YB2016040.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., He, X. & Zhou, A. HEEL: exploratory entity linking for heterogeneous information networks. Knowl Inf Syst 62, 485–506 (2020). https://doi.org/10.1007/s10115-019-01354-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01354-1

Keywords

Navigation