PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
arXiv - CS - Programming Languages Pub Date : 2020-06-02 , DOI: arxiv-2006.02230
Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous including in softwares for image recognition, speech recognition, speech synthesis, language translation, to name a few. he training of DNN architectures however is computationally expensive. Once the model is created, its use in the intended application - the inference task, is computationally heavy too and the inference needs to be fast for real time use. For obtaining high performance today, the code of Deep Learning (DL) primitives optimized for specific architectures by expert programmers exposed via libraries is the norm. However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable. To address this performance-productivity challenge, in this paper we present compiler algorithms to automatically generate high performance implementations of DL primitives that closely match the performance of hand optimized libraries. We develop novel data reuse analysis algorithms using the polyhedral model to derive efficient execution schedules automatically. In addition, because most DL primitives use some variant of matrix multiplication at their core, we develop a flexible framework where it is possible to plug in library implementations of the same in lieu of a subset of the loops. We show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance. We develop compiler algorithms to also perform operator fusions that reduce data movement through the memory hierarchy of the computer system.

中文翻译：

PolyDL：用于创建高性能 DL 基元的多面体优化

深度神经网络 (DNN) 已经彻底改变了我们生活的许多方面。DNN 的使用正变得无处不在，包括用于图像识别、语音识别、语音合成、语言翻译等软件中。然而，DNN 架构的训练在计算上是昂贵的。创建模型后，它在预期应用程序中的使用 - 推理任务，计算量也很大，推理需要快速实时使用。为了在今天获得高性能，由专家程序员通过库公开的针对特定架构优化的深度学习 (DL) 原语代码是常态。然而，鉴于新 DNN 架构的不断出现，创建手动优化的代码成本高、速度慢且不可扩展。为了应对这一绩效-生产力挑战，在本文中，我们提出了编译器算法，以自动生成与手动优化库的性能非常匹配的 DL 原语的高性能实现。我们使用多面体模型开发了新颖的数据重用分析算法，以自动推导出高效的执行计划。此外，由于大多数 DL 原语在其核心使用矩阵乘法的某种变体，我们开发了一个灵活的框架，可以插入相同的库实现来代替循环的子集。我们表明，这种混合编译器加上最少的库使用方法可产生最先进的性能。我们开发了编译器算法来执行算子融合，从而减少通过计算机系统内存层次结构的数据移动。我们使用多面体模型开发了新颖的数据重用分析算法，以自动推导出高效的执行计划。此外，由于大多数 DL 原语在其核心使用矩阵乘法的某种变体，我们开发了一个灵活的框架，可以插入相同的库实现来代替循环的子集。我们表明，这种混合编译器加上最少的库使用方法可产生最先进的性能。我们开发了编译器算法来执行算子融合，从而减少通过计算机系统内存层次结构的数据移动。我们使用多面体模型开发了新颖的数据重用分析算法，以自动推导出高效的执行计划。此外，由于大多数 DL 原语在其核心使用矩阵乘法的某种变体，我们开发了一个灵活的框架，可以插入相同的库实现来代替循环的子集。我们表明，这种混合编译器加上最少的库使用方法可产生最先进的性能。我们开发了编译器算法来执行算子融合，从而减少通过计算机系统内存层次结构的数据移动。我们表明，这种混合编译器加上最少的库使用方法可产生最先进的性能。我们开发了编译器算法来执行算子融合，从而减少通过计算机系统内存层次结构的数据移动。我们表明，这种混合编译器加上最少的库使用方法可产生最先进的性能。我们开发了编译器算法来执行算子融合，以减少通过计算机系统的内存层次结构的数据移动。

更新日期：2020-11-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>