当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient FPGA-based graph processing with hybrid pull-push computational model
Frontiers of Computer Science ( IF 4.2 ) Pub Date : 2020-01-03 , DOI: 10.1007/s11704-019-9020-5
Chengbo Yang , Long Zheng , Chuangyi Gui , Hai Jin

Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs. Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations. Nevertheless, considering the limited on-chip resources and streamline pipeline computation, the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph processing. In this paper, we present a hybrid graph processing system on FPGAs, which can achieve the best of both worlds. Our approach on FPGAs is unique and novel as follow. First, we propose to use edge block (consisting of edges with the same destination vertex set), which allows to sequentially access edges at block granularity for locality while still preserving the precision. Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices, this also enables to skip invalid blocks for reducing redundant computation. Second, we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history. We also present to switch models in advance with few stalls using their state statistics. Our evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.

中文翻译:

混合推拉计算模型的基于FPGA的高效图形处理

混合推拉计算模型可以为处理现实世界的图形提供令人信服的结果。FPGA的可编程性和流水线并行性使其有可能处理图迭代的不同阶段。尽管如此,考虑到有限的片上资源和流水线计算,由于众所周知的图形处理随机访问特性,FPGA上的混合模型效率常常受到影响。在本文中,我们提出了一种在FPGA上的混合图形处理系统,该系统可以实现两全其美。我们在FPGA上的方法具有独特性和新颖性,如下所述。首先,我们建议使用边缘块(由具有相同目标顶点集的边缘组成),它允许按块粒度顺序访问边缘以获取局部性,同时仍保留精度。由于块的独立性,即非活动块中的所有边都与非活动顶点相关联,因此这也可以跳过无效的块,以减少冗余计算。其次,我们考虑了大量的顶点及其关联的边块,以维持可预测的执行历史。我们还提出了使用状态统计信息提前切换很少停顿的模型。我们对许多实际图形的图形算法进行了评估,结果表明,与基于FPGA的图形处理系统相比,我们的方法可将速度提高3.69倍。我们考虑了大量的顶点及其关联的边块,以维持可预测的执行历史。我们还提出了使用状态统计信息提前切换很少停顿的模型。我们对许多实际图形的图形算法进行了评估,结果表明,与基于FPGA的图形处理系统相比,我们的方法可将速度提高3.69倍。我们考虑了大量的顶点及其关联的边块,以维持可预测的执行历史。我们还提出了使用状态统计信息提前切换很少停顿的模型。我们对许多实际图形的图形算法进行了评估,结果表明,与基于FPGA的图形处理系统相比,我们的方法可将速度提高3.69倍。
更新日期:2020-01-03
down
wechat
bug