当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
arXiv - CS - Hardware Architecture Pub Date : 2020-09-01 , DOI: arxiv-2009.00748
Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, and Andreas Moshovos

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multiplier input, with an area-efficient hardware scheduler. While the interconnect allows a very limited set of movements per operand, the scheduler can effectively extract sparsity when it is present in the activations, weights or gradients of neural networks. Over a wide set of models covering various applications, TensorDash accelerates the training process by $1.95{\times}$ while being $1.89\times$ more energy-efficient, $1.6\times$ more energy efficient when taking on-chip and off-chip memory accesses into account. While TensorDash works with any datatype, we demonstrate it with both single-precision floating-point units and bfloat16.

中文翻译:

TensorDash:利用稀疏性来加速深度神经网络训练和推理

TensorDash 是一种硬件级技术,用于使数据并行 MAC 单元能够利用其输入操作数流中的稀疏性。当用于构成深度学习的硬件加速器时,TensorDash 可以加快训练过程,同时提高能源效率。TensorDash 结合了低成本、稀疏输入操作数互连,包括每个乘法器输入的 8 输入多路复用器,以及面积高效的硬件调度器。虽然互连允许每个操作数的一组非常有限的运动,但当它存在于神经网络的激活、权重或梯度中时,调度程序可以有效地提取稀疏性。在涵盖各种应用程序的广泛模型中,TensorDash 将训练过程加快了 1.95 美元{\times}$,同时提高了 1.89 美元\times$ 的能效,即 1 美元。将片上和片外存储器访问考虑在内时,能效提高 6 倍。虽然 TensorDash 适用于任何数据类型,但我们使用单精度浮点单元和 bfloat16 来演示它。
更新日期:2020-09-03
down
wechat
bug