Reduce Operations: Send Volume Balancing While Minimizing Latency,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reduce Operations: Send Volume Balancing While Minimizing Latency
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-06-01 , DOI: 10.1109/tpds.2020.2964536
M. Ozan Karsavuran , Seher Acer , Cevdet Aykanat

Communication hypergraph model was proposed in a two-phase setting for encapsulating multiple communication cost metrics (bandwidth and latency), which are proven to be important in parallelizing irregular applications. In the first phase, computational-task-to-processor assignment is performed with the objective of minimizing total volume while maintaining computational load balance. In the second phase, communication-task-to-processor assignment is performed with the objective of minimizing total number of messages while maintaining communication-volume balance. The reduce-communication hypergraph model suffers from failing to correctly encapsulate send-volume balancing. We propose a novel vertex weighting scheme that enables part weights to correctly encode send-volume loads of processors for send-volume balancing. The model also suffers from increasing the total communication volume during partitioning. To decrease this increase, we propose a method that utilizes the recursive bipartitioning framework and refines each bipartition by vertex swaps. For performance evaluation, we consider column-parallel SpMV, which is one of the most widely known applications in which the reduce-task assignment problem arises. Extensive experiments on 313 matrices show that, compared to the existing model, the proposed models achieve considerable improvements in all communication cost metrics. These improvements lead to an average decrease of 30 percent in parallel SpMV time on 512 processors for 70 matrices with high irregularity.

中文翻译：

减少操作：在最小化延迟的同时发送卷平衡

通信超图模型是在两阶段设置中提出的，用于封装多个通信成本指标（带宽和延迟），这些指标被证明在并行化不规则应用程序中很重要。在第一阶段，执行计算任务到处理器的分配，目的是在保持计算负载平衡的同时最小化总容量。在第二阶段，执行通信任务到处理器的分配，目的是在保持通信量平衡的同时最小化消息总数。减少通信超图模型无法正确封装发送量平衡。我们提出了一种新颖的顶点加权方案，使部件权重能够正确编码处理器的发送量负载以实现发送量平衡。该模型还受到分区期间总通信量增加的影响。为了减少这种增加，我们提出了一种利用递归二分框架并通过顶点交换细化每个二分的方法。对于性能评估，我们考虑列并行 SpMV，这是最广为人知的应用程序之一，其中出现了减少任务分配问题。对 313 个矩阵的大量实验表明，与现有模型相比，所提出的模型在所有通信成本指标上都取得了相当大的改进。这些改进导致 512 个处理器上 70 个高度不规则矩阵的并行 SpMV 时间平均减少 30%。我们提出了一种利用递归二分框架并通过顶点交换细化每个二分的方法。对于性能评估，我们考虑列并行 SpMV，这是最广为人知的应用程序之一，其中出现了减少任务分配问题。对 313 个矩阵的大量实验表明，与现有模型相比，所提出的模型在所有通信成本指标上都取得了相当大的改进。这些改进导致 512 个处理器上 70 个高度不规则矩阵的并行 SpMV 时间平均减少 30%。我们提出了一种利用递归二分框架并通过顶点交换细化每个二分的方法。对于性能评估，我们考虑列并行 SpMV，这是最广为人知的应用程序之一，其中出现了减少任务分配问题。对 313 个矩阵的大量实验表明，与现有模型相比，所提出的模型在所有通信成本指标上都取得了相当大的改进。这些改进导致 512 个处理器上 70 个高度不规则矩阵的并行 SpMV 时间平均减少 30%。对 313 个矩阵的大量实验表明，与现有模型相比，所提出的模型在所有通信成本指标上都取得了相当大的改进。这些改进导致 512 个处理器上 70 个高度不规则矩阵的并行 SpMV 时间平均减少 30%。对 313 个矩阵的大量实验表明，与现有模型相比，所提出的模型在所有通信成本指标上都取得了相当大的改进。这些改进导致 512 个处理器上 70 个高度不规则矩阵的并行 SpMV 时间平均减少 30%。

更新日期：2020-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11