Skip to main content
Log in

DEM: Deep Entity Matching Across Heterogeneous Information Networks

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Heterogeneous information networks, which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects, are ubiquitous in the real world. In this paper, we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network, and we propose a new method named DEM short for Deep Entity Matching. In contrast to the traditional entity matching methods, DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching. Importantly, we incorporate DEM with the network embedding methodology, enabling highly efficient computing in a vectorized manner. DEM’s generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly. To illustrate its functionality, we apply the DEM algorithm to two real-world entity matching applications: user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks. Extensive experiments on real-world datasets demonstrate DEM’s effectiveness and rationality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hu G, Zhang Y, Yang Q. CoNet: Collaborative cross networks for cross-domain recommendation. In Proc. the 27th Int. Conference on Information and Knowledge Management, October 2018, pp.667-676.

  2. Wang X, Peng Z, Wang S, Yu P S, Fu W, Hong X. Cross-domain recommendation for cold-start users via neighborhood based feature mapping. In Proc. the 23rd Int. Conference on Information Database Systems for Advanced Applications, May 2018, pp.158-165.

  3. Benson A R, Kleinberg J M. Link prediction in networks with core-fringe data. In Proc. the 27th World Wide Web Conference, May 2019, pp.94-104.

  4. Huo Z, Huang X, Hu X. Link prediction with personalized social influence. In Proc. the 32nd AAAI Conference on Artificial Intelligence, August 2019, pp.2289-2296.

  5. Wang Y, Feng C, Chen L, Yin H, Guo C, Chu Y. User identity linkage across social networks via linked heterogeneous network embedding. World Wide Web, 2019, 22(6): 2611-2632.

    Article  Google Scholar 

  6. Li C, Wang S, Wang H, Liang Y, Yu P S, Li Z, Wang W. Partially shared adversarial learning for semi-supervised multi-platform user identity linkage. In Proc. the 28th Int. Conference on Information and Knowledge Management, November 2019, pp.249-258.

  7. Chen J, Wang C, Ester M, Shi Q, Feng Y, Chen C. Social recommendation with missing not at random data. In Proc. the 18th Int. Conference on Data Mining, November 2018, pp.29-38.

  8. Kong C, Gao M, Xu C, Fu Y, Qian W, Zhou A. EnAli: Entity alignment across multiple heterogeneous data sources. Frontiers Comput. Sci., 2019, 13(1): 157-169.

    Article  Google Scholar 

  9. Srivastava R K, Greff K, Schmidhuber J. Training very deep networks. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.2377-2385.

  10. Scannapieco M, Figotin I, Bertino E, Elmagarmid A K. Privacy preserving schema and data matching. In Proc. the 2007 ACM SIGMOD Int. Conference on Management of Data, June 2007, pp.653-664.

  11. Barbosa L. Learning representations of Web entities for entity resolution. International Journal of Web Information Systems, 2019, 15(3): 346-358.

    Google Scholar 

  12. Tantipathananandh C, Berger-Wolf T Y. Constant-factor approximation algorithms for identifying dynamic communities. In Proc. the 15th Int. Conference on Knowledge Discovery and Data Mining, June 2009, pp.827-836.

  13. Cheng A, Zhou C, Yang H, Wu J, Li L, Tan J, Guo L. Deep active learning for anchor user prediction. In Proc. the 28th Int. Joint Conference on Artificial Intelligence, August 2019, pp.2151-2157.

  14. Armandpour M, Ding P, Huang J, Hu X. Robust negative sampling for network embedding. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019, pp.3191-3198.

  15. Bandyopadhyay S, Lokesh N, Murty M N. Outlier aware network embedding for attributed networks. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019, pp.12-19.

  16. Gao M, Chen L, He X, Zhou A. BiNE: Bipartite network embedding. In Proc. the 41st Int. ACM SIGIR Conference on Research and Development in Information Retrieval, July 2018, pp.715-724.

  17. Newcombe H B, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130(3381): 954-959.

    Article  Google Scholar 

  18. Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng., 2012, 24(9): 1537-1555.

    Article  Google Scholar 

  19. Mohtasseb H, Ahmed A. Two-layered Blogger identification model integrating profile and instance-based methods. Knowl. Inf. Syst., 2012, 31(1): 1-21.

    Article  Google Scholar 

  20. Hernández M A, Stolfo S J. The merge/purge problem for large databases. In Proc. the 1995 ACM SIGMOD Int. Conference on Management of Data, May 1995, pp.127-138.

  21. Vidanage A, Ranbaduge T, Christen P, Schnell R. Efficient pattern mining based cryptanalysis for privacy-preserving record linkage. In Proc. the 35th Int. Conference on Data Engineering, April 2019, pp.1698-1701.

  22. Barbosa L. Learning representations of Web entities for entity resolution. Int. J. Web Inf. Syst., 2019, 15(3): 346-358.

    Google Scholar 

  23. Verroios V, Garcia-Molina H. Top-K entity resolution with adaptive locality-sensitive hashing. In Proc. the 35th Int. Conference on Data Engineering, April 2019, pp.1718-1721.

  24. Tejada S, Knoblock C A, Minton S. Learning object identification rules for information integration. Inf. Syst., 2001, 26(8): 607-633.

    Article  Google Scholar 

  25. Liang D, Zhang F, Zhang Wet al. Adaptive multi-attention network incorporating answer information for duplicate question detection. In Proc. the 42nd Int. ACM SIGIR Conference on Research and Development in Information Retrieval, July 2019, pp.95-104.

  26. McCarthy J F, Lehnert W G. Using decision trees for coreference resolution. In Proc. the 14th International Joint Conference on Artificial Intelligence, August 1995, pp.1050-1055.

  27. Gorla S, Velivelli S, Murthy N L B, Malapati A. Named Entity Recognition for Telugu news articles using naïve Bayes classifier. In Proc. the 2nd International Workshop on Recent Trends in News Information Retrieval Co-Located with 40th European Conference on Information Retrieval, March 2018, pp.33-38.

  28. Ponzetto S P, Strube M. Exploiting semantic role labeling, wordNet and Wikipedia for coreference resolution. In Proc. the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, June 2006, pp.192-199.

  29. Rahman M A, Ng V. Supervised models for coreference resolution. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.968-977.

  30. Arasu A, Götz M, Kaushik R. On active learning of record matching packages. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.783-794.

  31. Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003, pp.39-48.

  32. Konda P, Das S, G.C. Suganthan P et al. Magellan: Toward building entity matching management systems. Proceedings of the VLDB Endowment, 2016, 9(12): 1197-1208.

  33. Mudgal S, Li H, Rekatsinas T et al. Deep learning for entity matching: A design space exploration. In Proc. the 2018 Int. Conference on Management of Data, June 2018, pp.19-34.

  34. Ebraheem M, Thirumuruganathan S, Joty S R, Ouzzani M, Tang N. DeepER-Deep entity resolution. arXiv:1710.00597, 2017. http://arxiv.org/abs/1710.00597, August 2018.

  35. LeCun Y, Bengio Y, Hinton G E. Deep learning. Nature, 2015, 521(7553): 436-444.

    Article  Google Scholar 

  36. Hoffer E, Ailon N. Deep metric learning using triplet network. In Proc. the 3rd International Workshop on Similarity-Based Pattern Recognition, October 2015, pp.84-92.

  37. Neculoiu P, Versteegh M, Rotaru M. Learning text similarity with Siamese recurrent networks. In Proc. the 1st Workshop on Representation Learning for NLP, August 2016, pp.148-157.

  38. Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In Proc. the 33rd International Conference on Machine Learning, June 2016, pp.2071-2080.

  39. Lerer A, Wu L, Shen J et al. PyTorch-BigGraph: A largescale graph embedding system. arXiv:1903.12287, 2019. http://arxiv.org/abs/1903.12287, April 2019.

  40. Kasai J, Qian K, Gurajada S, Li Y, Popa L. Low-resource deep entity resolution with transfer and active learning. In Proc. the 57th Int. Conference of the Association for Computational Linguistics, July 2019, pp.5851-5861.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Kong.

Electronic supplementary material

ESM 1

(PDF 245 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, C., Chen, BX. & Zhang, LP. DEM: Deep Entity Matching Across Heterogeneous Information Networks. J. Comput. Sci. Technol. 35, 739–750 (2020). https://doi.org/10.1007/s11390-020-0139-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-0139-5

Keywords

Navigation