Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures
arXiv - CS - Hardware Architecture Pub Date : 2021-08-26 , DOI: arxiv-2108.11521
Khushal Sethi

Large-scale graph processing has drawn great attention in recent years. Most of the modern-day datacenter workloads can be represented in the form of Graph Processing such as MapReduce etc. Consequently, a lot of designs for Domain-Specific Accelerators have been proposed for Graph Processing. Spatial Architectures have been promising in the execution of Graph Processing, where the graph is partitioned into several nodes and each node works in parallel. We conduct experiments to analyze the on-chip movement of data in graph processing on a Spatial Architecture. Based on the observations, we identify a data movement bottleneck, in the execution of such highly parallel processing accelerators. To mitigate the bottleneck we propose a novel power-law aware Graph Partitioning and Data Mapping scheme to reduce the communication latency by minimizing the hop counts on a scalable network-on-chip. The experimental results on popular graph algorithms show that our implementation makes the execution 2-5x faster and 2.7-4x energy-efficient by reducing the data movement time in comparison to a baseline implementation.

中文翻译：

用于空间架构并行图分析的高效片上通信

近年来，大规模图形处理引起了极大的关注。大多数现代数据中心工作负载可以以图处理的形式表示，例如 MapReduce 等。因此，已经为图处理提出了许多特定领域加速器的设计。空间架构在图处理的执行中很有前景，其中图被划分为多个节点，每个节点并行工作。我们进行实验以分析空间架构上图形处理中数据的片上移动。根据观察结果，我们确定了执行此类高度并行处理加速器时的数据移动瓶颈。为了缓解瓶颈，我们提出了一种新颖的幂律感知图分区和数据映射方案，通过最小化可扩展片上网络上的跳数来减少通信延迟。流行图算法的实验结果表明，与基线实现相比，我们的实现通过减少数据移动时间使执行速度提高 2-5 倍，能源效率提高 2.7-4 倍。

更新日期：2021-08-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>