当前位置:
X-MOL 学术
›
arXiv.cs.PF
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Learned Performance Model for the Tensor Processing Unit
arXiv - CS - Performance Pub Date : 2020-08-03 , DOI: arxiv-2008.01040 Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, and Mike Burrows
arXiv - CS - Performance Pub Date : 2020-08-03 , DOI: arxiv-2008.01040 Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, and Mike Burrows
Accurate hardware performance models are critical to efficient code
generation. They can be used by compilers to make heuristic decisions, by
superoptimizers as an minimization objective, or by autotuners to find an
optimal configuration of a specific program. However, they are difficult to
develop because contemporary processors are complex, and the recent
proliferation of deep learning accelerators has increased the development
burden. We demonstrate a method of learning performance models from a corpus of
tensor computation graph programs for the Tensor Processing Unit (TPU). We
train a neural network over kernel-level sub-graphs from the corpus and find
that the learned model is competitive to a heavily-optimized analytical cost
model used in the production XLA compiler.
中文翻译:
张量处理单元的学习性能模型
准确的硬件性能模型对于高效的代码生成至关重要。它们可以被编译器用来做出启发式决策,被超级优化器用作最小化目标,或者被自动调谐器用来寻找特定程序的最佳配置。然而,它们很难开发,因为当代处理器很复杂,而且最近深度学习加速器的激增增加了开发负担。我们展示了一种从张量处理单元 (TPU) 的张量计算图程序语料库中学习性能模型的方法。我们在来自语料库的内核级子图上训练神经网络,发现学习模型与生产 XLA 编译器中使用的高度优化的分析成本模型相比具有竞争力。
更新日期:2020-08-04
中文翻译:
张量处理单元的学习性能模型
准确的硬件性能模型对于高效的代码生成至关重要。它们可以被编译器用来做出启发式决策,被超级优化器用作最小化目标,或者被自动调谐器用来寻找特定程序的最佳配置。然而,它们很难开发,因为当代处理器很复杂,而且最近深度学习加速器的激增增加了开发负担。我们展示了一种从张量处理单元 (TPU) 的张量计算图程序语料库中学习性能模型的方法。我们在来自语料库的内核级子图上训练神经网络,发现学习模型与生产 XLA 编译器中使用的高度优化的分析成本模型相比具有竞争力。