Skip to main content
Log in

Efficient distributed reachability querying of massive temporal graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world temporal networks that may be distributed across multiple data centers. This state of affairs motivates the paper’s study of efficient reachability queries on distributed temporal graphs. We propose an efficient index, called Temporal Vertex Labeling (TVL), which is a labeling scheme for distributed temporal graphs. We also present algorithms that exploit TVL to achieve efficient support for distributed reachability querying over temporal graphs in Pregel-like systems. The algorithms exploit several optimizations that hinge upon non-trivial lemmas. Extensive experiments using massive real and synthetic temporal graphs are conducted to provide detailed insight into the efficiency and scalability of the proposed methods, covering both index construction and query processing. Compared with the state-of-the-art methods, the TVL based query algorithms are capable of up to an order of magnitude speedup with lower index construction overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Giraph is available at http://giraph.apache.org/.

  2. Hama is available at http://hama.apache.org/.

  3. KONECT is available at konect.uni-koblenz.de/.

  4. JTGraph is available at http://www.cse.psu.edu/~kxm85/software/GTgraph/.

  5. GTimer is available at http://www.cse.cuhk.edu.hk/systems/graph/Gtimer/index.html.

  6. Code of Grail is available at https://code.google.com/archive/p/grail/.

  7. Austin is available at https://code.google.com/archive/p/googletransitdatafeed/wikis/PublicFeeds.wiki.

References

  1. Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: SIGMOD, pp. 253–262 (1989)

  2. Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

    Article  Google Scholar 

  3. Casteigts, A., Flocchini, P., Quattrociocchi, W., Santoro, N.: Time-varying graphs and dynamic networks. IJPEDS 27(5), 387–408 (2012)

    Google Scholar 

  4. Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB, pp. 493–504 (2005)

  5. Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE, pp. 893–902 (2008)

  6. Cheng, J., Huang, S., Wu, H., Fu, A.W.: Tf-label: A topological-folding labeling scheme for reachability querying in a large graph. In: SIGMOD, pp. 193–204 (2013)

  7. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. PVLDB 5(11), 1304–1315 (2012)

    Google Scholar 

  9. Gao, Y., Miao, X., Chen, G., Zheng, B., Cai, D., Cui, H.: On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J. 26(4), 467–492 (2017)

    Article  Google Scholar 

  10. Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)

  11. Gurajada, S., Theobald, M.: Distributed set reachability. In: SIGMOD, pp. 1247–1261 (2016)

  12. Holme, P., Saramäki, J.: Temporal networks. Phys. Rep. 519(3), 97–125 (2012)

    Article  Google Scholar 

  13. Huang, S., Cheng, J., Wu, H.: Temporal graph traversals: definitions, algorithms, and applications. CoRR arxiv:1401.1919 (2014)

  14. Huang, S., Fu, A.W., Liu, R.: Minimum spanning trees in temporal graphs. In: SIGMOD, pp. 419–430 (2015)

  15. Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)

    Article  MathSciNet  Google Scholar 

  16. Jin, R., Ruan, N., Dey, S., Yu, J.X.: SCARAB: scaling reachability computation on large graphs. In: SIGMOD, pp. 169–180 (2012)

  17. Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7:1–7:44 (2011)

    Article  Google Scholar 

  18. Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. PVLDB 6(14), 1978–1989 (2013)

    Google Scholar 

  19. Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: SIGMOD, pp. 595–608 (2008)

  20. Kostakos, V.: Temporal graphs. Phys. A Stat. Mech. Appl. 388(6), 1007–1023 (2009)

    Article  MathSciNet  Google Scholar 

  21. Koubarakis, M., Stamou, G.B., Stoilos, G., Horrocks, I., Kolaitis, P.G., Lausen, G., Weikum, G. (eds.): Reasoning Web. Reasoning on the Web in the Big Data Era. Lecture Notes in Computer Science, vol. 8714. Springer (2014)

  22. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  23. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

  24. Michail, O., Spirakis, P.G.: Traveling salesman problems in temporal graphs. Theor. Comput. Sci. 634, 1–23 (2016)

    Article  MathSciNet  Google Scholar 

  25. Nicosia, V., Tang, J.K., Musolesi, M., Russo, G., Mascolo, C., Latora, V.: Components in time-varying graphs. CoRR arxiv:1106.2134 (2011)

  26. Pan, R.K., Saramäki, J.: Path lengths, correlations, and centrality in temporal networks. CoRR arxiv:1101.5913 (2011)

  27. Redmond, U., Cunningham, P.: Temporal subgraph isomorphism. In: ASONAM, pp. 1451–1452 (2013)

  28. Redmond, U., Cunningham, P.: Subgraph isomorphism in temporal networks. CoRR arxiv:1605.02174 (2016)

  29. van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: SIGMOD, pp. 913–924 (2011)

  30. Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: ICDE, pp. 1009–1020 (2013)

  31. Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

  32. Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: can it be even faster? TKDE 29(3), 683–697 (2017)

    Google Scholar 

  33. Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. PVLDB 7(3), 193–204 (2013)

    Google Scholar 

  34. Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD, pp. 845–856 (2007)

  35. Ueno, K., Suzumura, T., Maruyama, N., Fujisawa, K., Matsuoka, S.: Efficient breadth-first search on massively parallel and distributed-memory machines. Data Sci. Eng. 2(1), 22–35 (2017)

    Article  Google Scholar 

  36. Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: ICDE, p. 75 (2006)

  37. Wang, S., Lin, W., Yang, Y., Xiao, X., Zhou, S.: Efficient route planning on public transportation networks: a labelling approach. In: SIGMOD, pp. 967–982 (2015)

  38. Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. PVLDB 7(12), 1191–1202 (2014)

    Google Scholar 

  39. Wu, H., Cheng, J., Huang, S., Ke, Y., Lu, Y., Xu, Y.: Path problems in temporal graphs. PVLDB 7(9), 721–732 (2014)

    Google Scholar 

  40. Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Efficient processing of reachability and time-based path queries in a temporal graph. CoRR arxiv:1601.05909 (2016)

  41. Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Reachability and time-based path queries in temporal graphs. In: ICDE, pp. 145–156 (2016)

  42. Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. PVLDB 7(14), 1981–1992 (2014)

    Google Scholar 

  43. Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)

  44. Yan, D., Tian, Y., Cheng, J.: Systems for Big Graph Analytics. Springer Briefs in Computer Science. Springer, Berlin (2017)

    Book  Google Scholar 

  45. Yang, Y., Yan, D., Wu, H., Cheng, J., Zhou, S., Lui, J.C.S.: Diversified temporal subgraph pattern mining. In: SIGKDD, pp. 1965–1974 (2016)

  46. Yano, Y., Akiba, T., Iwata, Y., Yoshida, Y.: Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In: CIKM, pp. 1601–1606 (2013)

  47. Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)

    Article  Google Scholar 

  48. Yildirim, H., Chaoji, V., Zaki, M.J.: DAGGER: a scalable index for reachability queries in large dynamic graphs. CoRR arxiv:1301.0977 (2013)

  49. Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Managing and Mining Graph Data, pp. 181–215 (2010)

    Chapter  Google Scholar 

  50. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

  51. Zhang, X., Chen, L.: Distance-aware selective online query processing over large distributed graphs. Data Sci. Eng. 2(1), 2–21 (2017)

    Article  Google Scholar 

  52. Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: SIGMOD, pp. 1323–1334 (2014)

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grant No. 61972338, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, the ZJU-Hikvision Joint Project, and the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative. Yunjun Gao is the corresponding author of the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunjun Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Gao, Y., Chen, L. et al. Efficient distributed reachability querying of massive temporal graphs. The VLDB Journal 28, 871–896 (2019). https://doi.org/10.1007/s00778-019-00572-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00572-x

Keywords

Navigation