当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
arXiv - CS - Hardware Architecture Pub Date : 2019-09-02 , DOI: arxiv-1909.00557
Ye Yu, and Niraj K. Jha

CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference.

中文翻译:

SPRING:用于训练和推理的稀疏感知的降低精度的单片 3D CNN 加速器架构

CNN 在广泛的应用中优于传统的机器学习算法。然而,它们的计算复杂性使得有必要设计高效的硬件加速器。大多数 CNN 加速器专注于探索利用计算并行性的数据流样式。然而,稀疏性带来的潜在性能加速尚未得到充分解决。如果在网络评估中利用稀疏性,可以显着减少 CNN 的计算和内存占用。为了利用稀疏性,一些加速器设计探索了 CNN 加速器上的稀疏性编码和评估。但是,稀疏编码仅在激活或权重上执行,并且仅在推理中执行。已经表明,激活和权重在训练期间也具有很高的稀疏性。因此,在训练中还应考虑稀疏感知计算。为了进一步提高性能和能源效率,一些加速器以有限的精度评估 CNN。然而,这仅限于推理,因为如果在训练中使用,降低精度会牺牲网络精度。此外,CNN 评估通常是内存密集型的,尤其是在训练中。在本文中,我们提出了 SPRING,一种用于训练和推理的 SParsity-aware Reduced-precision Monolithic 3D CNN 加速器。SPRING 支持 CNN 训练和推理。它使用二进制掩码方案来编码激活和权重的稀疏性。它使用随机舍入算法以降低的精度训练 CNN,而不会损失精度。为了缓解 CNN 评估中的内存瓶颈,特别是在训练中,SPRING 使用高效的单片 3D NVM 接口来增加内存带宽。与 GTX 1080 Ti 相比,SPRING 在 CNN 训练方面分别在性能、功耗和能效方面实现了 15.6 倍、4.2 倍和 66.0 倍的提升,在推理方面分别实现了 15.5 倍、4.5 倍和 69.1 倍的提升。
更新日期:2020-06-25
down
wechat
bug