当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)
arXiv - CS - Performance Pub Date : 2021-04-27 , DOI: arxiv-2104.13242
Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, Mary Hall

In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We select six of the most complex PolyBench benchmarks and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance. Then we present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark. We also extend the ytopt autotuning framework to tune a deep learning application.

中文翻译:

使用贝叶斯优化使用LLVM Clang / Polly Loop Optimization语法自动调整PolyBench基准测试(扩展版本)

在本文中,我们开发了一个ytopt自动调整框架,该框架利用贝叶斯优化来探索参数空间搜索,并在贝叶斯优化中比较四种不同的监督学习方法并评估其有效性。我们选择六个最复杂的PolyBench基准,然后将新开发的LLVM Clang / Polly循环优化实用程序应用于基准以对其进行优化。然后,我们使用自动调整框架来优化编译指示参数,以提高其性能。实验结果表明,我们的自整定方法优于其他编译方法,在200个代码评估中使用两个大型数据集为基准syr2k,3mm,heat-3d,lu和协方差提供了最短的执行时间,从而有效地搜索了到170,368种不同的配置。我们发现Floyd-Warshall基准测试无法从自动调整中受益,因为Polly使用试探法优化了基准测试,以使其运行慢得多。为了解决此问题,我们提供了一些编译器选项解决方案来提高性能。然后,我们使用简单的mctree自动调整框架提供了用户不知情的循环自动调整,以进一步提高Floyd-Warshall基准测试的性能。我们还扩展了ytopt自动调整框架,以调整深度学习应用程序。使用简单的mctree自动调整框架来进一步提高Floyd-Warshall基准测试的性能。我们还扩展了ytopt自动调整框架,以调整深度学习应用程序。使用简单的mctree自动调整框架来进一步提高Floyd-Warshall基准测试的性能。我们还扩展了ytopt自动调整框架,以调整深度学习应用程序。
更新日期:2021-04-29
down
wechat
bug