Towards distributed node similarity search on graphs

Zhang, Tianming; Gao, Yunjun; Zheng, Baihua; Chen, Lu; Wen, Shiting; Guo, Wei

doi:10.1007/s11280-020-00819-6

Towards distributed node similarity search on graphs

Published: 18 June 2020

Volume 23, pages 3025–3053, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Tianming Zhang¹,
Yunjun Gao¹,
Baihua Zheng²,
Lu Chen³,
Shiting Wen⁴ &
…
Wei Guo¹

422 Accesses
4 Citations
Explore all metrics

Abstract

Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Link prediction in social networks using hyper-motif representation on hypergraph

Article 12 April 2024

Advances in Collaborative Filtering

Notes

As suggested in [13], restart probability c is empirically set as 0.5.
Giraph is available at http://giraph.apache.org/.

References

Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: Survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)
Article Google Scholar
Batko, M., Kohoutková, P., Novak, D.: Cophir image collection under the microscope. In: SISAP , pp 47–54 (2009)
Boutet, A., Kermarrec, A., Mittal, N., Taïani, F.: Being prepared in a sparse world: The case of K NN graph construction. In: ICDE, pp. 241–252 (2016)
Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: ICDE, pp. 591–602 (2015)
Chen, L., Gao, Y., Chen, G., Zhang, H.: Metric all-k-nearest-neighbor search. IEEE Trans. Knowl. Data Eng. 28(1), 98–112 (2016)
Article Google Scholar
Chen, G., Yang, K., Chen, L., Gao, Y., Zheng, B., Chen, C.: Metric similarity joins using mapreduce. IEEE Trans. Knowl. Data Eng. 29(3), 656–669 (2017)
Article Google Scholar
Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: A balance between structural and attribute similarities. TKDD 5(2), 12:1–12:33 (2011)
Article MathSciNet Google Scholar
Cohen, S., Kimelfeld, B., Koutrika, G.: A survey on proximity measures for social networks. In: Search Computing - Broadening Web Search, pp. 191–206 (2012)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011)
Dong, Y., Zhang, J., Tang, J., Chawla, N.V., Wang, B.: Coupledlp: Link prediction in coupled networks. In: SIGKDD, pp. 199–208 (2015)
Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)
Google Scholar
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)
Jeh, G., Widom, J.: Simrank: A measure of structural-context similarity. In: SIGKDD, pp. 538–543 (2002)
Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Trans. Knowl. Data Eng. 28(5), 1189–1202 (2016)
Article Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Google Scholar
Ma, H., Zhu, J., Lyu, M.R., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multimedia 12(5), 462–473 (2010)
Article Google Scholar
Maehara, T., Akiba, T., Iwata, Y., Kawarabayashi, K.: Computing personalized pagerank quickly by exploiting graph structures. PVLDB 7(12), 1023–1034 (2014)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Meng, F., Rui, X., Wang, Z., Xing, Y., Cao, L.: Coupled node similarity learning for community detection in attributed networks. Entropy 20(6), 471 (2018)
Article Google Scholar
Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: SIGKDD, pp. 653–658 (2004)
Plaku, E., Kavraki, L.E.: Distributed computation of the k nn graph for large high-dimensional point sets. J. Parallel Distrib. Comput. 67(3), 346–359 (2007)
Article Google Scholar
Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: SIGKDD, pp. 513–522 (2010)
Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. arXiv:1206.5259(2012)
Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
Shin, K., Jung, J., Sael, L., Kang, U.: Bear: Block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015)
Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7(3), 193–204 (2013)
Google Scholar
Trad, M.R., Joly, A., Boujemaa, N.: Distributed k NN-graph approximation via hashing. In: ICMR, p. 43 (2012)
Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: SIGMOD, pp. 1139–1150 (2014)
Xu, G., Fu, B., Gu, Y.: Point-of-interest recommendations via a supervised random walk algorithm. IEEE Intell. Syst. 31(1), 15–23 (2016)
Article Google Scholar
Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location-based social networks. ACM TIST 7(3), 30 (2016)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In: CIKM, pp. 1442–1451 (2012)
Zhang, Q., Li, M., Deng, Y., Mahadevan, S.: Measure the similarity of nodes in the complex networks. arXiv:1502.00780 (2015)
Zhang, Y., Huang, K., Geng, G., Liu, C.: Fast k NN graph construction with locality sensitive hashing. In: PKDD, pp. 660–674 (2013)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grants No. 61972338 and 61802344, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, and the ZJU-Hikvision Joint Project. Yunjun Gao is the corresponding author of the work.

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Zhejiang University of Technology, Hangzhou, China
Tianming Zhang, Yunjun Gao & Wei Guo
School of Information Systems, Singapore Management University, Singapore, Singapore
Baihua Zheng
Department of Computer Science, Aalborg University, Aalborg, Denmark
Lu Chen
The Ningbo Institute of Technology, Zhejiang University, Ningbo, China
Shiting Wen

Authors

Tianming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Baihua Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shiting Wen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunjun Gao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, T., Gao, Y., Zheng, B. et al. Towards distributed node similarity search on graphs. World Wide Web 23, 3025–3053 (2020). https://doi.org/10.1007/s11280-020-00819-6

Download citation

Received: 01 October 2018
Revised: 15 January 2020
Accepted: 22 April 2020
Published: 18 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11280-020-00819-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards distributed node similarity search on graphs

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

Link prediction in social networks using hyper-motif representation on hypergraph

Advances in Collaborative Filtering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards distributed node similarity search on graphs

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

Link prediction in social networks using hyper-motif representation on hypergraph

Advances in Collaborative Filtering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation