Abstract
A heterogeneous information network (HIN) is a ubiquitous data model, consisting of multiple types of entities and relations. Names of entities in HINs are inherently ambiguous, making it difficult to fully disambiguate a HIN. In this paper, we introduce the task of exploratory entity linking for HINs. Given a partially disambiguated HIN, we aim at linking ambiguous names to disambiguated entities in the HIN if their referent entities are present. We also try to “explore” other alternatives by discovering new entities and adding them to the HIN. A partial classification EM-based approach is proposed to address this task. We present a constrained probability propagation model to link surface names to entities in the HIN. New entity detection process is modeled as a maximum edge weight clique problem. Experiments illustrate that our method outperforms state-of-the-art methods for entity linking with HINs and author name disambiguation.
Similar content being viewed by others
Notes
In meta-path description, we use P, A, T, V and Y to represent any nodes (i.e., entities) in the FDS with the type of paper, author, term, venue and year, respectively.
To our knowledge, there exist some other EL methods that consider the NIL issue such as [23]. But their task is to link mentions in the plain texts to entities in the knowledge bases and it is not easy to modify them for EL with HINs.
There are no unlinkable records for the remaining two author names.
References
Alidaee B, Glover F, Kochenberger GA, Wang H (2007) Solving the maximum edge weight clique problem via unconstrained quadratic programming. Eur J Oper Res 181(2):592–597
Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: ACL-COLING, pp 79–85
Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: EACL
Carmel D, Chang M-W, Gabrilovich E, Hsu B-JP, Wang K (2014) Erd’14: entity recognition and disambiguation challenge. In: SIGIR Forum vol 48, no 2, pp 63–77
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Chiang M-F, Liou J-J, Wang J-L, Peng W-C, Shan M-K (2013) Exploring heterogeneous information networks and random walk with restart for academic search. Knowl Inf Syst 36(1):59–82
Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H (2016) A piggyback system for joint entity mention detection and linking in web queries. In: WWW, pp 567–578
Dalvi BB, Cohen WW, Callan J (2013) Exploratory learning. In: ECML-PKDD, pp 128–143
Ferreira AA, Gonçalves MA, Laender AHF (2012) A brief survey of automatic methods for author name disambiguation. In: SIGMOD Record, vol 41, no 2, pp 15–26
Ganea O-E, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: WWW, pp 927–938
Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: SIGIR, pp 765–774
Kanani PH, McCallum A, Chris P (2007) Improving author coreference by resource-bounded information gathering from the web. In: IJCAI, pp 429–434
Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67
Li C, Cheung WK, Ye Y, Zhang X, Chu D-H, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383
Pei L, Luna DX, Andrea M, Divesh S (2011) Linking temporal records. In: PVLDB, vol 4, no 11, pp 956–967
Li S, Cong G, Miao C (2012) Author name disambiguation using a new categorical distribution similarity. In: ECML-PKDD, pp 569–584
Li Y, Tan S, Sun H, Han J, Dan R, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: WWW, pp 1261–1270
Pitts M, Savvana S, Roy SB, Mandava V (2014) ALIAS: author disambiguation in Microsoft academic search engine dataset. In: EDBT, pp 648–651
Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: CIKM, pp 1241–1246
Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: SIGMOD, pp 1199–1210
Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. TKDE 27(2):443–460
Shen W, Wang J, Luo P, Wang M (2012) LIEGE: link entities in web lists with knowledge base. In: KDD, pp 1424–1432
Shen W, Wang J, Luo P, Wang M (2012) LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWW
Shi C, Li Y, Yu PS, Bin W (2016) Constrained-meta-path-based ranking in heterogeneous information network. Knowl Inf Syst 49(2):719–747
Sil A, Florian R (2016) One for all: towards language independent named entity linking. In: ACL, pp 2255–2264
Solecki B, Silva L, Efimov D (2013) KDD cup 2013: author disambiguation. In: KDD Cup 2013 workshop, pp 9:1–9:3
Sun Y, Han J, Yan X, Yu PS, Tianyi W (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: PVLDB, vol 4, no 11, pp 992–1003
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: EDBT, pp 565–576
Tang J (2016) Aminer: toward understanding big scholar data. In: WSDM, p 467
Wang C, Zhang R, He X, Zhou A (2016) Error link detection and correction in Wikipedia. In: CIKM, pp 307–316
Wang X, Tang J , Cheng H, Yu PS (2011) ADANA: active name disambiguation. In: ICDM, pp 794–803
Yang Y, Chang M-W (2015) S-MART: novel tree-based structured learning algorithms applied to tweet entity linking. In: ACL-IJCNLP, pp 504–513
Yin X, Han J, Yu PS (2007) Object distinction: distinguishing objects with identical names. In: ICDE, pp 1242–1246
Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification. A case study: online name disambiguation using temporal record streams. In: CIKM, pp 1341–1350
Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: SIGIR, pp 425–434
Acknowledgements
This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000904. Chengyu Wang is partially supported by the Outstanding Doctoral Dissertation Cultivation Plan of Action under Grant No. YB2016040.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., He, X. & Zhou, A. HEEL: exploratory entity linking for heterogeneous information networks. Knowl Inf Syst 62, 485–506 (2020). https://doi.org/10.1007/s10115-019-01354-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01354-1