Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs
ACM Transactions on Architecture and Code Optimization ( IF 1.5 ) Pub Date : 2020-09-30 , DOI: 10.1145/3416510
Aravind Acharya ₁ , Uday Bondhugula ₁ , Albert Cohen ₂

Affiliation

Polyhedral auto-transformation frameworks are known to find efficient loop transformations that maximize locality and parallelism and minimize synchronization. While complex loop transformations are routinely modeled in these frameworks, they tend to rely on ad hoc heuristics for loop fusion. Although there exist multiple loop fusion models with cost functions to maximize locality and parallelism, these models involve separate optimization steps rather than seamlessly integrating with other loop transformations like loop permutation, scaling, and shifting. Incorporating parallelism-preserving loop fusion heuristics into existing affine transformation frameworks like Pluto, LLVM-Polly, PPCG, and PoCC requires solving a large number of Integer Linear Programming formulations, which increase auto-transformation times significantly. In this work, we incorporate polynomial time loop fusion heuristics into the Pluto-lp-dfp framework. We present a data structure called the fusion conflict graph (FCG), which enables us to efficiently model loop fusion in the presence of other affine loop transformations. We propose a clustering heuristic to group the vertices of the FCG, which further enables us to provide three different polynomial time greedy fusion heuristics, namely, maximal fusion , typed fusion , and hybrid fusion , while maintaining the compile time improvements of Pluto-lp-dfp over Pluto. Our experiments reveal that the hybrid fusion model, in conjunction with Pluto’s cost function, finds efficient transformations that outperform PoCC and Pluto by mean factors of 1.8× and 1.07×, respectively, with a maximum performance improvement of 14× over PoCC and 2.6× over Pluto.

中文翻译：

使用融合冲突图的多面体编译中的有效循环融合

众所周知，多面体自动转换框架可以找到有效的循环转换，以最大限度地提高局部性和并行性并最大限度地减少同步。虽然在这些框架中通常对复杂的循环转换进行建模，但它们往往依赖于临时启发式方法来进行循环融合。尽管存在多个具有成本函数以最大化局部性和并行性的循环融合模型，但这些模型涉及单独的优化步骤，而不是与循环置换、缩放和移位等其他循环转换无缝集成。将保持并行性的循环融合启发式方法结合到现有的仿射变换框架（如 Pluto、LLVM-Polly、PPCG 和 PoCC）中需要求解大量整数线性规划公式，这会显着增加自动变换时间。在这项工作中，冥王星-lp-dfp框架。我们提出了一种数据结构，称为融合冲突图（FCG），这使我们能够在存在其他仿射循环变换的情况下有效地模拟循环融合。我们提出了一种聚类启发式来对 FCG 的顶点进行分组，这进一步使我们能够提供三种不同的多项式时间贪婪融合启发式，即，最大融合,类型化融合，和混合融合，同时保持 Pluto-lp-dfp 相对于 Pluto 的编译时间改进。我们的实验表明，混合融合模型与 Pluto 的成本函数相结合，发现有效的转换分别优于 PoCC 和 Pluto，平均因子分别为 1.8 倍和 1.07 倍，最大性能提升为 PoCC 14 倍和 2.6 倍冥王星。

更新日期：2020-09-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11