A performance predictor for implementation selection of parallelized static and temporal graph algorithms,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A performance predictor for implementation selection of parallelized static and temporal graph algorithms
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-04-26 , DOI: 10.1002/cpe.6267
Akif Rehman ₁ , Masab Ahmad ₁ , Omer Khan ₁

Affiliation

Task-based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context of ordered and unordered task-based implementations, that trade-off work-efficiency with parallelism. The monotonicity of convergent graph solutions is the reason behind the trade-off between work-efficiency and parallelism. This trade-off results in variable performance-based choices within and across different machines (CPUs and GPUs), graph algorithms, implementations (ordered, relaxed, and unordered). Input graphs also augment this choice space, with this work analyzing temporally changing graphs in addition to the static graphs explored by prior works. These algorithmic and architectural choices are first explored in this work, and it is seen that different graph workload-input combinations perform optimally on diverse architectural configurations. The resulting choice space is analyzed and this work represents it in the form of characteristic variables that correlate with each choice space. Using these characteristic variables, this work proposes analytical and neural network models to correlate these choice spaces to find the best performing implementation. The variables and the prediction models proposed in this work are also integrated with a state-of-the-art performance predictor on a multiaccelerator setup, and shows geometric performance gains of 54% on a CPU, 14% on a GPU, and 31.5% in a multiaccelerator setup over baseline implementations without performance prediction.

中文翻译：

用于并行化静态和时间图算法的实现选择的性能预测器

图工作负载的基于任务的执行允许各种有序和无序的实现，任务表示图顶点和边之间的依赖关系。这项工作在基于有序和无序基于任务的实现的上下文中探索图算法，在工作效率与并行性之间进行权衡。收敛图解的单调性是工作效率和并行性之间权衡背后的原因。这种权衡导致不同机器（CPU 和 GPU）、图形算法、实现（有序、宽松和无序）内部和之间基于性能的可变选择。输入图也增加了这个选择空间，除了先前工作探索的静态图外，这项工作还分析了时间变化的图。这些算法和架构选择首先在这项工作中进行探索，并且可以看出，不同的图工作负载-输入组合在不同的架构配置上表现最佳。分析产生的选择空间，这项工作以与每个选择空间相关的特征变量的形式表示它。使用这些特征变量，这项工作提出了分析和神经网络模型来关联这些选择空间，以找到性能最佳的实现。这项工作中提出的变量和预测模型还与多加速器设置上的最先进性能预测器集成，并显示 CPU 上的几何性能提升 54%、GPU 上的 14% 和 31.5%在没有性能预测的基线实现的多加速器设置中。

更新日期：2021-04-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>