当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency
arXiv - CS - Hardware Architecture Pub Date : 2020-11-03 , DOI: arxiv-2011.01713
Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.

中文翻译:

CUTIE:超越 PetaOp/s/W 三元 DNN 推理加速,具有优于二元的能效

我们提出了一个用于三元神经网络的 3.1 POp/s/W 全数字硬件加速器。CUTIE 是完全展开的三元推理引擎,专注于最大限度地减少非计算能量和切换活动,从而最大限度地减少用于存储(本地或全局)中间结果的动态功耗。这是通过 1) 完全展开在特征映射和过滤器维度中的数据路径架构来实现的,以通过有利于迭代计算的沉默和最大化数据重用来减少切换活动,2) 针对三元神经网络,与二元神经网络相比,允许减少切换活动的稀疏权重,以及 3) 引入优化的训练方法以提高滤波器权重的稀疏性,从而进一步减少切换活动。与最先进的加速器相比,
更新日期:2020-11-04
down
wechat
bug