Compiler-assisted Operator Template Library for DNN Accelerators,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Compiler-assisted Operator Template Library for DNN Accelerators
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2021-03-25 , DOI: 10.1007/s10766-021-00701-6
Jiansong Li , Wei Cao , Xiao Dong , Guangli Li , Xueying Wang , Peng Zhao , Lei Liu , Xiaobing Feng

Despite many dedicated accelerators are gaining popularity for their performance and energy efficiency in the deep neural network (DNN) domain, high-level programming support for these accelerators remains thin. In contrast to existing researches targeting the whole DNNs, we choose to dive into details and review this problem from a finer-grained level, operators. Due to performance concerns, operator programmers may have to take hand-written assembly as their first choice, which is error-prone and involves many programming chores. To alleviate this problem, we propose TOpLib, a compiler-assisted template library. By providing a unified user-view abstraction, TOpLib allows programmers to express computational kernels with high-level tensor primitives, which will be automatically lowered into low-level intrinsic primitives via expression templates. Moreover, considering memory management is performance-critical and the optimization strategy of expression template is limited to enumeration based rewriting rules, we implement TOpLib with a compiler-assisted approach. We address the memory reuse challenges into the compiler, which allows TOpLib to make full use of on-chip buffers and result in better performance. Experiments over 55 typical DNN operators demonstrate that TOpLib can generate scalable code with performance faster than or on par with hand-written assembly versions.

中文翻译：

DNN加速器的编译器辅助操作员模板库

尽管许多专用的加速器因其在深度神经网络（DNN）域中的性能和能效而受到欢迎，但对这些加速器的高级编程支持仍然很少。与针对整个DNN的现有研究相比，我们选择深入研究细节并从更细粒度的层次（运营商）来审查此问题。由于性能方面的考虑，操作员程序员可能不得不以手写汇编作为他们的第一选择，因为它容易出错，并且涉及许多编程工作。为了缓解此问题，我们建议使用TOpLib，这是编译器辅助的模板库。通过提供统一的用户视图抽象，TOpLib允许程序员使用高级张量原语表达计算内核，这些张量原语将通过表达式模板自动降低为低级固有原语。此外，考虑到内存管理对性能至关重要，并且表达式模板的优化策略仅限于基于枚举的重写规则，因此我们使用编译器辅助方法来实现TOpLib。我们将存储器重用问题解决到了编译器中，这使TOpLib可以充分利用片上缓冲器并提高性能。在55个典型的DNN运算符上进行的实验表明，TOpLib可以生成可伸缩代码，其性能比手写汇编版本快或与之相当。这使TOpLib可以充分利用片上缓冲器，从而获得更好的性能。在55个典型的DNN运算符上进行的实验表明，TOpLib可以生成可伸缩代码，其性能比手写汇编版本快或与之相当。这使TOpLib可以充分利用片上缓冲器，从而获得更好的性能。在55个典型的DNN运算符上进行的实验表明，TOpLib可以生成可伸缩代码，其性能比手写汇编版本快或与之相当。

更新日期：2021-03-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>