Abstract
Prior algorithms on graph simulation for distributed graphs are not scalable enough as they exhibit heavy message passing. Moreover, they are dependent on the graph partitioning quality that can be a bottleneck due to the natural skew present in real-world data. As a result, their degree of parallelism becomes limited. In this paper, we propose an efficient parallel edge-centric approach for distributed graph pattern matching. We design a novel distributed data structure called ST that allows a fine-grain parallelism, and hence guarantees linear scalability. Based on ST, we develop a parallel graph simulation algorithm called PGSim. Furthermore, we propose PDSim, an edge-centric algorithm that efficiently evaluates dual simulation in parallel. PDSim combines ST and PGSim in a Split-and-Combine approach to accelerate the computation stages. We prove the effectiveness and efficiency of these propositions through theoretical guarantees and extensive experiments on massive graphs. The achieved results confirm that our approach outperforms existing algorithms by more than an order of magnitude.
Similar content being viewed by others
References
Bhattarai B, Liu H, Huang HH (2019) Ceci: compact embedding cluster index for scalable subgraph matching. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1447–1462. ACM, Amsterdam, Netherlands
Bi F, Chang L, Lin X, Qin L, Zhang W (2016) Efficient subgraph matching by postponing cartesian products. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1199–1214. ACM, San Francisco, California, USA
Bouhenni S, Yahiaoui S, Nouali-Taboudjemat N, Kheddouci H (2021) A survey on distributed graph pattern matching in massive graphs. ACM Comput Surv. https://doi.org/10.1145/3439724
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM
Cordella LP, Foggia P, Sansone C, Vento M (2001) An improved algorithm for matching large graphs. Proceedings of the 3rd IAPR workshop on graph-based representations in pattern recognition 219(2):149–159. https://doi.org/10.1.1.101.5342
Csun S, Luo Q (2018) Parallelizing recursive backtracking based subgraph matching on a single machine. 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, Singapore, Singapore, pp 1–9
Dustin WS (2019) Social media statistics 2020: top networks by the numbers. https://dustinstout.com/social-media-statistics/. Accessed: 2021-03-01
Fan W (2012) Graph pattern matching revised for social network analysis. In: Proceedings of the 15th International Conference on Database Theory, ICDT ’12, p. 8-21. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2274576.2274578
Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) 1Graph pattern matching: from intractable to polynomial time. Proc VLDB Endow 3(1–2):264–275 (10.14778/1920841.1920878)
Fan W, Wang X, Wu Y (2013) Diversified top-k graph pattern matching. Proc VLDB Endow 6(13):1510–1521
Fan W, Wang X, Wu Y (2013) Incremental graph pattern matching. Database Syst ACM Trans. https://doi.org/10.1145/2489791
Fan W, Wang X, Wu Y, Deng D (2014) Distributed graph simulation: impossibility and possibility. Proc VLDB Endow 7(12):1083–1094 (10.14778/2732977.2732983)
Fan W, Yu W, Xu J, Zhou J, Luo X, Yin Q, Lu P, Cao Y, Xu R (2018) Parallelizing sequential graph computations. ACM Trans Database Syst (TODS) 43(4):1–39
Fard A, Nisar MU, Ramaswamy L, Miller JA, Saltz M (2013) A distributed vertex-centric approach for pattern matching in massive graphs. In: 2013 IEEE International Conference on Big Data, pp. 403–411. IEEE, Santa Clara, CA, USA. https://doi.org/10.1109/BigData.2013.6691601
Gao J, Liu P, Kang X, Zhang L, Wang J (2016) Prs: parallel relaxation simulation for massive graphs. Comput J 59(6):848–860
Gao J, Zhou C, Zhou J, Yu JX (2014) Continuous pattern detection over billion-edge graph using distributed framework. 2014 IEEE 30th International Conference on Data Engineering. IEEE, Chicago, IL, USA, pp 556–567
Garey MR, Johnson DS (1979) Computers and intractability: a guide to np-completeness
Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 289–300. ACM, Utah USA
Han WS, Lee J, Lee JH (2013) Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp. 337–348. Association for Computing Machinery, New York, New York, USA. https://doi.org/10.1145/2463676.2465300
He H, Singh AK (2008) Graphs-at-a-time: Query language and access methods for graph databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pp. 405–418. Association for Computing Machinery, Vancouver, Canada. https://doi.org/10.1145/1376616.1376660
Henzinger MR, Henzinger TA, Kopke PW (1995) Computing simulations on finite and infinite graphs. In: Proceedings of IEEE 36th Annual Foundations of Computer Science, pp. 453–462. IEEE, USA
Kao JS, Chou J (2016) Distributed incremental pattern matching on streaming graphs. In: Proceedings of the ACM Workshop on High Performance Graph Processing, HPGP ’16, p. 43-50. Association for Computing Machinery, Kyoto, Japan. https://doi.org/10.1145/2915516.2915519
Lai L, Qin L, Lin X, Chang L (2015) Scalable subgraph enumeration in mapreduce. Proc VLDB Endow 8(10):974–985
Lai L, Qin L, Lin X, Zhang Y, Chang L, Yang S (2016) Scalable distributed subgraph enumeration. Proc VLDB Endow 10(3):217–228
Lai L, Qing Z, Yang Z, Jin X, Lai Z, Wang R, Hao K, Lin X, Qin L, Zhang W et al (2019) Distributed subgraph matching on timely dataflow. Proc VLDB Endow 12(10):1099–1112
Leskovec J, Krevl A (2014) SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data
Li J, Cao Y, Ma S (2017) Relaxing graph pattern matching with explanations. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1677–1686. ACM, Singapore Singapore
Li J, Li J, Wang X (2018) A vertex-centric graph simulation algorithm for large graphs. In: Xu Z, Gao X, Miao Q, Zhang Y, Bu J (eds) Big Data. Springer, Singapore, pp 238–254
Liu C, Chen C, Han J, Yu PS (2006) Gplag: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pp. 872–881. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1150402.1150522
Ma S, Cao Y, Fan W, Huai J, Wo T (2011) Capturing topology in graph pattern matching. Proc VLDB Endow 5(4):310–321
Ma S, Cao Y, Huai J, Wo T (2012) Distributed graph pattern matching. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 949–958. Association for Computing Machinery, Lyon, France. https://doi.org/10.1145/2187836.2187963
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146
Milner R (1989) Communication and concurrency, vol. 84. Prentice hall Englewood Cliffs
Ogaard K, Roy H, Kase S, Nagi R, Sambhoos K, Sudit M (2013) Discovering patterns in social networks with graph matching algorithms. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social computing, behavioral-cultural modeling and prediction. Springer, Berlin, Heidelberg, pp 341–349
Peng P, Zou L, Özsu MT, Chen L, Zhao D (2016) Processing sparql queries over distributed rdf graphs. VLDB J 25(2):243–268
Qiao M, Zhang H, Cheng H (2017) Subgraph matching: on compression and computation. Proc VLDB Endow 11(2):176–188
Ren X, Wang J (2015) Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment 8(5):617–628
Reza T, Ripeanu M, Tripoul N, Sanders G, Pearce R (2018) Prunejuice: pruning trillion-edge graphs to a precise pattern-matching solution. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 265–281. IEEE, Dallas, Texas, USA. https://doi.org/10.1109/SC.2018.00024
Schätzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2016) S2x: graph-parallel querying of rdf with graphx. In: Wang F, Luo G, Weng C, Khan A, Mitra P, Yu C (eds) Biomedical data management and graph online querying. Springer International Publishing, Cham, pp 155–168
Serafini M, De Francisci Morales G, Siganos G (2017) Qfrag: distributed graph search via subgraph isomorphism. In: proceedings of the 2017 symposium on cloud computing, pp. 214–228. ACM, Santa Clara, CA
Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc VLDB Endow 1(1):364–375
Shemshadi A, Sheng QZ, Qin Y (2016) Efficient pattern matching for graphs with multi-labeled nodes. Know-Based Syst 109:256–265
Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. Proc VLDB Endow 5(9):788–799
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42. https://doi.org/10.1145/321921.321925
Wang J, Ren X, Anirban S, Wu XW (2019) Correct filtering for subgraph isomorphism search in compressed vertex-labeled graphs. Inf Sci 482:363–373
Wang Z, Gu R, Hu W, Yuan C, Huang Y (2019) Benu: Distributed subgraph enumeration with backtracking-based framework. 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, Macao, Macao, pp 136–147
Wu X, Theodoratos D, Skoutas D, Lan M (2020) Leveraging double simulation to efficiently evaluate hybrid patterns on data graphs. In: Huang Z, Beek W, Wang H, Zhou R, Zhang Y (eds) Web information systems engineering-WISE 2020. Springer International Publishing, Cham, pp 255–269
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. First international workshop on graph data management experiences and systems. ACM, New York, USA, pp 1–6
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I et al (2010) Spark: cluster computing with working sets. HotCloud 10(10):95
Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale rdf data. Proc VLDB Endow 6(4):265–276
Zhao P, Han J (2010) On graph query optimization in large networks. Proc VLDB Endow 3(1–2):340–351
Acknowledgements
This work was supported by the Franco-Algerian program PHC Tassili BiGreen n\(^\circ\)18 MDU 111 and by the DGRSDT grant FNRSDT N\(^\circ\)253. The experiments presented in this work were carried out using the High Performance Computing Platform IBNBADIS provided by the Research Center on Scientific and Technical Information—CERIST (Algeria).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bouhenni, S., Yahiaoui, S., Nouali-Taboudjemat, N. et al. Efficient parallel edge-centric approach for relaxed graph pattern matching. J Supercomput 78, 1642–1671 (2022). https://doi.org/10.1007/s11227-021-03938-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03938-7