当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compiler-assisted Operator Template Library for DNN Accelerators
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2021-03-25 , DOI: 10.1007/s10766-021-00701-6
Jiansong Li , Wei Cao , Xiao Dong , Guangli Li , Xueying Wang , Peng Zhao , Lei Liu , Xiaobing Feng

Despite many dedicated accelerators are gaining popularity for their performance and energy efficiency in the deep neural network (DNN) domain, high-level programming support for these accelerators remains thin. In contrast to existing researches targeting the whole DNNs, we choose to dive into details and review this problem from a finer-grained level, operators. Due to performance concerns, operator programmers may have to take hand-written assembly as their first choice, which is error-prone and involves many programming chores. To alleviate this problem, we propose TOpLib, a compiler-assisted template library. By providing a unified user-view abstraction, TOpLib allows programmers to express computational kernels with high-level tensor primitives, which will be automatically lowered into low-level intrinsic primitives via expression templates. Moreover, considering memory management is performance-critical and the optimization strategy of expression template is limited to enumeration based rewriting rules, we implement TOpLib with a compiler-assisted approach. We address the memory reuse challenges into the compiler, which allows TOpLib to make full use of on-chip buffers and result in better performance. Experiments over 55 typical DNN operators demonstrate that TOpLib can generate scalable code with performance faster than or on par with hand-written assembly versions.



中文翻译:

DNN加速器的编译器辅助操作员模板库

尽管许多专用的加速器因其在深度神经网络(DNN)域中的性能和能效而受到欢迎,但对这些加速器的高级编程支持仍然很少。与针对整个DNN的现有研究相比,我们选择深入研究细节并从更细粒度的层次(运营商)来审查此问题。由于性能方面的考虑,操作员程序员可能不得不以手写汇编作为他们的第一选择,因为它容易出错,并且涉及许多编程工作。为了缓解此问题,我们建议使用TOpLib,这是编译器辅助的模板库。通过提供统一的用户视图抽象,TOpLib允许程序员使用高级张量原语表达计算内核,这些张量原语将通过表达式模板自动降低为低级固有原语。此外,考虑到内存管理对性能至关重要,并且表达式模板的优化策略仅限于基于枚举的重写规则,因此我们使用编译器辅助方法来实现TOpLib。我们将存储器重用问题解决到了编译器中,这使TOpLib可以充分利用片上缓冲器并提高性能。在55个典型的DNN运算符上进行的实验表明,TOpLib可以生成可伸缩代码,其性能比手写汇编版本快或与之相当。这使TOpLib可以充分利用片上缓冲器,从而获得更好的性能。在55个典型的DNN运算符上进行的实验表明,TOpLib可以生成可伸缩代码,其性能比手写汇编版本快或与之相当。这使TOpLib可以充分利用片上缓冲器,从而获得更好的性能。在55个典型的DNN运算符上进行的实验表明,TOpLib可以生成可伸缩代码,其性能比手写汇编版本快或与之相当。

更新日期:2021-03-25
down
wechat
bug