当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.10976
Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, Mieszko Lis

The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are optimized for the access patterns during $\mathit{inference}$, however, they do not efficiently support the emerging sparse $\mathit{training}$ techniques. In this paper, we demonstrate (a) that accelerating sparse training requires a co-design approach where algorithms are adapted to suit the constraints of hardware, and (b) that hardware for sparse DNN training must tackle constraints that do not arise in inference accelerators. As proof of concept, we adapt a sparse training algorithm to be amenable to hardware acceleration; we then develop dataflow, data layout, and load-balancing techniques to accelerate it. The resulting system is a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model. Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$\times$ less energy and offers up to 4$\times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.

中文翻译:

Procrustes:用于稀疏深度神经网络训练的数据流和加速器

DNN 修剪的成功导致了节能推理加速器的开发,这些加速器支持具有稀疏权重和激活张量的修剪模型。因为这些架构中的内存布局和数据流针对 $\mathit{inference}$ 期间的访问模式进行了优化,但是,它们不能有效地支持新兴的稀疏 $\mathit{training}$ 技术。在本文中,我们证明 (a) 加速稀疏训练需要一种协同设计方法,其中算法适合硬件的约束,以及 (b) 稀疏 DNN 训练的硬件必须解决推理加速器中不会出现的约束. 作为概念证明,我们采用了稀疏训练算法以适应硬件加速;然后我们开发数据流、数据布局和负载平衡技术来加速它。生成的系统是一个稀疏 DNN 训练加速器,无需先训练,然后修剪,最后再训练,即可生成与密集模型具有相同精度的修剪模型。与在没有稀疏训练支持的情况下使用最先进的 DNN 加速器训练等效的未修剪模型相比,Procrustes 消耗的能量最多减少 3.26 倍,并在一系列模型中提供高达 4 倍的加速,同时按数量级修剪权重并保持未修剪的准确性。
更新日期:2020-09-24
down
wechat
bug