Skip to main content
Log in

Efficient FPGA-based graph processing with hybrid pull-push computational model

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs. Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations. Nevertheless, considering the limited on-chip resources and streamline pipeline computation, the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph processing. In this paper, we present a hybrid graph processing system on FPGAs, which can achieve the best of both worlds. Our approach on FPGAs is unique and novel as follow. First, we propose to use edge block (consisting of edges with the same destination vertex set), which allows to sequentially access edges at block granularity for locality while still preserving the precision. Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices, this also enables to skip invalid blocks for reducing redundant computation. Second, we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history. We also present to switch models in advance with few stalls using their state statistics. Our evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lumsdaine A, Gregor D, Hendrickson B, Hendrickson B, Berry J. Challenges in parallel graph processing. Parallel Processing Letters, 2007, 17(1): 5–20

    Article  MathSciNet  Google Scholar 

  2. Jin H, Yao P, Liao X. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): 274–276

    Google Scholar 

  3. Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD International Conference on Management of Data. 2010, 135–146

  4. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727

    Article  Google Scholar 

  5. Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30

  6. Kyrola A, Blelloch G E, Guestrin C. GraphChi: large-scale graph computation on just a PC disk-based graph computation. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46

  7. Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488

  8. Beamer S, Asanovic K, Patterson D. Direction-optimizing breadth-first search. Scientific Programming, 2013, 21(3–4): 137–148

    Article  Google Scholar 

  9. Shun J, Blelloch G E. Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2013, 135–146

  10. Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 456–471

  11. Han W S, Lee S, Park K, Lee J H, Kim M S, Kim J, Yu H. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 77–85

  12. Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613

  13. Zhu X, Han W, Chen W. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference. 2015, 375–386

  14. Sengupta D, Song S L, Agarwal K, Schwan K. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 28

  15. Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd International Conference on Data Engineering. 2016, 409–420

  16. Zhu X, Chen W, Zheng W, Ma X. Gemini: a computation-centric distributed graph processing system. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 301–316

  17. Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. 2015, 410–424

  18. Khorasani F, Vora K, Gupta R, Bhuyan L N. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 2014, 239–252

  19. Fu Z, Personick M, Thompson B. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems. 2014, 1–6

  20. Liu H, Huang H H. Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12

  21. Ma L, Yang Z, Chen H, Xue J, Dai Y. Garaph: efficient GPU-accelerated graph processing on a single machine with balanced replication. In: Proceedings of USENIX Annual Technical Conference. 2017, 195–207

  22. Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of International Symposium on Computer Architecture. 2015, 105–117

  23. Ham T J, Wu L, Sundaram N, Satish N, Martonosi M. Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. 2016, 1–13

  24. Song L, Zhuo Y, Qian X, Li H, Chen Y. GraphR: accelerating graph processing using ReRAM. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 531–543

  25. Zhang M, Zhuo Y, Wang C, Gao M, Wu Y, Chen K, Kozyrakis C, Qian X. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 544–557

  26. Nurvitadhi E, Weisz G, Wang Y, Hurkat S, Nguyen M, Hoe J C, Martinez J F, Guestrin C. GraphGen: an FPGA framework for vertex-centric graph computation. In: Proceedings of the 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 2014, 25–28

  27. Umuroglu Y, Morrison D, Jahre M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications. 2015, 1–8

  28. Zhou S, Chelmis C, Prasanna V K. High-throughput and energy-efficient graph processing on FPGA. In: Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 2016, 103–110

  29. Oguntebi T, Olukotun K. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 111–117

  30. Dai G, Chi Y, Wang Y, Yang H. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 105–110

  31. Dai G, Huang T, Chi Y, Xu N, Wang Y, Yang H. ForeGraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 217–226

  32. Zhang J, Khoram S, Li J. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 207–216

  33. Zhang J, Li J. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2018, 229–238

  34. Engelhardt N, So H K H. GraVF: a vertex-centric distributed graph processing framework on FPGAs. In: Proceedings of the 26th International Conference on Field-Programmable Custom Computing Machines. 2016, 1–4

  35. Zhou S, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing. 2017, 137–144

  36. Intel Altera. Altera SDK for OpenCL programming guide. Intel Altera, 2013

  37. Intel Altera. Altera SDK for OpenCL best practices guide version 16.0. Intel Altera, 2016

  38. McCune R R, Weninger T, Madey G. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys, 2015, 48(2): 25

    Article  Google Scholar 

  39. Watts D J, Strogatz S H. Collective dynamics of small-world networks. Nature, 1998, 393(6684): 440

    Article  Google Scholar 

  40. Li A S, Li X C, Pan Y C, Zhang W. Strategies for network security. Science China Information Sciences, 2015, 58(1): 1–14

    Article  Google Scholar 

  41. Langr D, Tvrdik P. Evaluation criteria for sparse matrix storage formats. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 428–440

    Article  Google Scholar 

  42. Wang Z, He B, Zhang W, Jiang S. A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of the International Conference on High Performance Computer Architecture. 2016, 114–125

  43. Aaftab M. The OpenCL specification version 1.0. Khronos OpenCL Working Group, 2009

  44. Leskovec J, Krevl A. Snap datasets: stanford large network dataset collection. 2014

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2018YFB1003502), the National Natural Science Foundation of China (Grant Nos. 61825202, 61832006, and 61702201).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Zheng.

Additional information

Chengbo Yang is now pursuing his master degree in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), China. His research interests include graph processing and reinforcement learning.

Long Zheng is now a postdoctoral researcher in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), China. He received his PhD degree at HUST in 2016. His current research interests include program analysis, runtime systems, and configurable computer architecture with a particular focus on graph processing.

Chuangyi Gui is currently a PhD candidate in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), China. His current research interests include graph processing and reconfigurable computing.

Hai Jin is a Cheung Kung Scholars Chair Professor of computer science and engineering at Huazhong University of Science and Technology (HUST), China. He received his PhD in computer engineering from HUST in 1994. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. He worked at The University of Hong Kong, China between 1998 and 2000, and as a visiting scholar at the University of Southern California, USA between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. He is the chief scientist of ChinaGrid, the largest grid computing project in China, and the chief scientists of National 973 Basic Research Program Project of Virtualization Technology of Computing System, and Cloud Security. He is an IEEE Fellow and a member of the ACM. He has co-authored 15 books and published over 600 research papers. His research interests include computer architecture, virtualization technology, cluster computing, and cloud computing, peer-to-peer computing, network storage, and network security.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Zheng, L., Gui, C. et al. Efficient FPGA-based graph processing with hybrid pull-push computational model. Front. Comput. Sci. 14, 144102 (2020). https://doi.org/10.1007/s11704-019-9020-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-9020-5

Keywords

Navigation