Improved Basic Block Reordering,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improved Basic Block Reordering
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-12-01 , DOI: 10.1109/tc.2020.2982888
Andy Newell , Sergey Pupyrev

Basic block reordering is an important step for profile-guided binary optimization. The state-of-the-art goal for basic block reordering is to maximize the number of fall-through branches. However, we demonstrate that such orderings may impose suboptimal performance on instruction and I-TLB caches. We propose a new algorithm that relies on a model combining the effects of fall-through and caching behavior. As details of modern processor caching is quite complex and often unknown, we show how to use machine learning in selecting parameters that best trade off different caching effects to maximize binary performance. An extensive evaluation on a variety of applications, including Facebook production workloads, the open-source compilers Clang and GCC, and SPEC CPU benchmarks, indicate that the new method outperforms existing block reordering techniques, improving the resulting performance of applications with large code size. We have open sourced the code of the new algorithm as a part of a post-link binary optimization tool, BOLT.

中文翻译：

改进的基本块重新排序

基本块重新排序是配置文件引导的二进制优化的重要步骤。基本块重新排序的最新目标是最大化失败分支的数量。然而，我们证明了这种排序可能会对指令和 I-TLB 缓存施加次优的性能。我们提出了一种新算法，该算法依赖于结合了失败和缓存行为影响的模型。由于现代处理器缓存的细节非常复杂且通常是未知的，我们展示了如何使用机器学习来选择最好地权衡不同缓存效果以最大化二进制性能的参数。对各种应用程序的广泛评估，包括 Facebook 生产工作负载、开源编译器 Clang 和 GCC，以及 SPEC CPU 基准测试，表明新方法优于现有的块重新排序技术，提高了大代码应用程序的性能。我们已将新算法的代码作为链接后二进制优化工具 BOLT 的一部分开源。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11