Grus,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Grus
ACM Transactions on Architecture and Code Optimization ( IF 1.5 ) Pub Date : 2021-02-10 , DOI: 10.1145/3444844
Pengyu Wang ₁ , Jing Wang ₁ , Chao Li ₁ , Jianzong Wang ₂ , Haojin Zhu ₁ , Minyi Guo ₁

Affiliation

Today’s GPU graph processing frameworks face scalability and efficiency issues as the graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe memory with Unified Memory (UM), they incur significant overhead when handling graph-structured data. In addition, many popular processing frameworks suffer sub-optimal efficiency due to heavy atomic operations when tracking the active vertices. This article presents Grus, a novel system framework that allows GPU graph processing to stay competitive with the ever-growing graph complexity. Grus improves space efficiency through a UM trimming scheme tailored to the data access behaviors of graph workloads. It also uses a lightweight frontier structure to further reduce atomic operations. With easy-to-use interface that abstracts the above details, Grus shows up to 6.4× average speedup over the state-of-the-art in-memory GPU graph processing framework. It allows one to process large graphs of 5.5 billion edges in seconds with a single GPU.

中文翻译：

格鲁斯

当今的 GPU 图形处理框架面临着可扩展性和效率问题，因为图形大小超过了 GPU 专用的内存限制。尽管最近的 GPU 可以使用统一内存 (UM) 超额订阅内存，但它们在处理图形结构数据时会产生大量开销。此外，许多流行的处理框架在跟踪活动顶点时由于繁重的原子操作而遭受次优效率。本文介绍了 Grus，这是一种新颖的系统框架，它允许 GPU 图处理与不断增长的图复杂性保持竞争力。Grus 通过针对图形工作负载的数据访问行为量身定制的 UM 修整方案提高了空间效率。它还使用轻量级的前沿结构来进一步减少原子操作。Grus 具有易于使用的界面，抽象了上述细节，最多可显示 6 个。比最先进的内存 GPU 图处理框架平均加速 4 倍。它允许使用单个 GPU 在几秒钟内处理 55 亿条边的大型图。

更新日期：2021-02-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11