当前位置: X-MOL 学术Perform. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Vectorization cost modeling for NEON, AVX and SVE
Performance Evaluation ( IF 2.2 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.peva.2020.102106
Angela Pohl , Biagio Cosenza , Ben Juurlink

Abstract Compiler optimization passes employ cost models to determine if a code transformation will yield performance improvements. When this assessment is inaccurate, compilers apply transformations that are not beneficial, or refrain from applying ones that would have improved the code. We analyze the accuracy of the cost models used in LLVM’s and GCC’s vectorization passes for three different instruction set architectures, including both traditional SIMD architectures with a defined fixed vector register size (AVX2 and NEON), and novel instruction set with scalable vector size (SVE). In general, speedup is over-estimated, resulting in mispredictions and a weak to medium correlation between predicted and actual performance gain. We therefore propose a novel cost model that is based on a code’s intermediate representation with refined memory access pattern features. Using linear regression techniques, this platform independent model is fitted to an AVX2 and a NEON hardware, as well as an SVE simulator. Results show that the fitted model significantly improves the correlation between predicted and measured speedup (AVX2: +52% for training data, +13% for validation data), reduces the average error of the speedup prediction (SVE: -43% for training data, -36% for validation data), as well as the number of mispredictions (NEON: -88% for training data, -71% for validation data) for more than 80 code patterns.

中文翻译:

NEON、AVX 和 SVE 的矢量化成本建模

摘要 编译器优化过程使用成本模型来确定代码转换是否会产生性能改进。当此评估不准确时,编译器会应用无益的转换,或避免应用可改进代码的转换。我们分析了用于三种不同指令集架构的 LLVM 和 GCC 向量化过程中使用的成本模型的准确性,包括具有定义的固定向量寄存器大小(AVX2 和 NEON)的传统 SIMD 架构,以及具有可扩展向量大小(SVE)的新型指令集)。一般来说,加速被高估,导致预测错误和预测和实际性能增益之间的弱到中等相关性。因此,我们提出了一种新颖的成本模型,该模型基于具有精细内存访问模式特征的代码中间表示。使用线性回归技术,该平台独立模型适用于 AVX2 和 NEON 硬件以及 SVE 模拟器。结果表明,拟合模型显着提高了预测加速比与实测加速比之间的相关性(AVX2:训练数据+52%,验证数据+13%),降低加速预测的平均误差(SVE:训练数据-43%) , -36% 用于验证数据),以及超过 80 种代码模式的错误预测数量(NEON:-88% 用于训练数据,-71% 用于验证数据)。
更新日期:2020-07-01
down
wechat
bug