当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration
arXiv - CS - Hardware Architecture Pub Date : 2020-11-25 , DOI: arxiv-2011.13000
Reena Elangovan, Shubham Jain, Anand Raghunathan

Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.11x-1.34x and 1.29x-1.6x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.14x-1.67x and 1.31x-1.93x improvement in system energy and performance respectively, with negligible accuracy loss.

中文翻译:

Ax-BxP:用于精确可重新配置的深度神经网络加速的近似分块计算

精确缩放已成为一种优化深度神经网络(DNN)的计算和存储要求的流行技术。创建超低精度(8位以下)DNN的努力表明,达到给定网络级精度所需的最低精度在整个网络甚至在网络中的各个层之间都存在很大差异,因此需要支持DNN中的可变精度硬件。诸如比特串行硬件之类的先前提议会产生高开销,从而大大降低了精度较低的好处。为了有效地支持DNN加速器中的精度可重新配置性,我们引入了一种近似计算方法,其中DNN计算以块为单位(一个块是一组位)执行,并且以块的粒度支持可重配置性。以近似方式组合逐块计算的结果,以实现有效的可重新配置性。我们设计了一种DNN加速器,其中包含近似的块计算,并提出了一种确定给定DNN的合适的近似配置的方法。通过改变DNN的近似配置,在8位定点(FxP8)基准上,我们分别在系统能量和性能上实现了1.11x-1.34x和1.29x-1.6x的改进,而分类精度的损失却可以忽略不计。此外,通过改变DNN中各层和数据结构的近似配置,我们分别在系统能量和性能方面实现了1.14x-1.67x和1.31x-1.93x的改进,而精度损失却可以忽略不计。我们设计了一种DNN加速器,其中包含近似的块计算,并提出了一种确定给定DNN的合适的近似配置的方法。通过改变DNN的近似配置,在8位定点(FxP8)基准上,我们分别在系统能量和性能上实现了1.11x-1.34x和1.29x-1.6x的改进,而分类精度的损失却可以忽略不计。此外,通过改变DNN中各层和数据结构的近似配置,我们分别在系统能量和性能方面实现了1.14x-1.67x和1.31x-1.93x的改进,而精度损失却可以忽略不计。我们设计了一种DNN加速器,其中包含近似的块计算,并提出了一种确定给定DNN的合适的近似配置的方法。通过改变DNN的近似配置,在8位定点(FxP8)基准上,我们分别在系统能量和性能上实现了1.11x-1.34x和1.29x-1.6x的改进,而分类精度的损失却可以忽略不计。此外,通过改变DNN中各层和数据结构的近似配置,我们分别在系统能量和性能方面实现了1.14x-1.67x和1.31x-1.93x的改进,而精度损失却可以忽略不计。在8位定点(FxP8)基准上,系统能耗和性能分别提高了6倍,而分类精度的损失却可以忽略不计。此外,通过改变DNN中各层和数据结构的近似配置,我们分别在系统能量和性能方面实现了1.14x-1.67x和1.31x-1.93x的改进,而精度损失却可以忽略不计。在8位定点(FxP8)基准上,系统能耗和性能分别提高了6倍,而分类精度的损失却可以忽略不计。此外,通过改变DNN中各层和数据结构的近似配置,我们分别在系统能量和性能方面实现了1.14x-1.67x和1.31x-1.93x的改进,而精度损失却可以忽略不计。
更新日期:2020-12-01
down
wechat
bug