当前位置: X-MOL 学术IEEE J. Solid-State Circuits › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference
IEEE Journal of Solid-State Circuits ( IF 5.4 ) Pub Date : 2020-12-29 , DOI: 10.1109/jssc.2020.3043870
Jie-Fang Zhang , Ching-En Lee , Chester Liu , Yakun Sophia Shao , Stephen W. Keckler , Zhengya Zhang

Recent developments in deep neural network (DNN) pruning introduces data sparsity to enable deep learning applications to run more efficiently on resource- and energy-constrained hardware platforms. However, these sparse models require specialized hardware structures to exploit the sparsity for storage, latency, and efficiency improvements to the full extent. In this work, we present the sparse neural acceleration processor (SNAP) to exploit unstructured sparsity in DNNs. SNAP uses parallel associative search to discover valid weight (W) and input activation (IA) pairs from compressed, unstructured, sparse W and IA data arrays. The associative search allows SNAP to maintain a 75% average compute utilization. SNAP follows a channel-first dataflow and uses a two-level partial sum (psum) reduction dataflow to eliminate access contention at the output buffer and cut the psum writeback traffic by 22 $\times $ compared with state-of-the-art DNN accelerator designs. SNAP’s psum reduction dataflow can be configured in two modes to support general convolution (CONV) layers, pointwise CONV, and fully connected layers. A prototype SNAP chip is implemented in a 16-nm CMOS technology. The 2.3-mm 2 test chip is measured to achieve a peak effectual efficiency of 21.55 TOPS/W (16 b) at 0.55 V and 260 MHz for CONV layers with 10% weight and activation densities. Operating on a pruned ResNet-50 network, the test chip achieves a peak throughput of 90.98 frames/s at 0.80 V and 480 MHz, dissipating 348 mW.

中文翻译:

SNAP:用于非结构化稀疏深度神经网络推理的高效稀疏神经加速处理器

深度神经网络(DNN)修剪的最新发展引入了数据稀疏性,使深度学习应用程序可以在资源和能源受限的硬件平台上更高效地运行。但是,这些稀疏模型需要专用的硬件结构才能充分利用稀疏性来充分提高存储,延迟和效率。在这项工作中,我们提出了稀疏神经加速处理器(SNAP),以利用DNN中的非结构稀疏性。SNAP使用并行关联搜索从压缩的,非结构化的,稀疏的W和IA数据数组中发现有效权重(W)和输入激活(IA)对。关联搜索使SNAP可以保持75%的平均计算利用率。 $ \次$ 与最新的DNN加速器设计相比。SNAP的psum减少数据流可以两种模式配置,以支持通用卷积(CONV)层,逐点CONV和完全连接层。SNAP原型芯片采用16纳米CMOS技术实现。经测量,对于具有10%重量和激活密度的CONV层,在0.55 V和260 MHz时,2.3 mm 2测试芯片的峰值有效效率达到21.55 TOPS / W(16 b)。该测试芯片在经过修剪的ResNet-50网络上运行,在0.80 V和480 MHz时达到90.98帧/秒的峰值吞吐量,耗散348 mW。
更新日期:2021-01-29
down
wechat
bug