当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-12 , DOI: arxiv-2101.04354 Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-12 , DOI: arxiv-2101.04354 Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda
As neural networks gain widespread adoption in embedded devices, there is a
need for model compression techniques to facilitate deployment in
resource-constrained environments. Quantization is one of the go-to methods
yielding state-of-the-art model compression. Most approaches take a fully
trained model, apply different heuristics to determine the optimal
bit-precision for different layers of the network, and retrain the network to
regain any drop in accuracy. Based on Activation Density (AD)-the proportion of
non-zero activations in a layer-we propose an in-training quantization method.
Our method calculates bit-width for each layer during training yielding a mixed
precision model with competitive accuracy. Since we train lower precision
models during training, our approach yields the final quantized model at lower
training complexity and also eliminates the need for re-training. We run
experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on
VGG19/ResNet18 architectures and report the accuracy and energy estimates for
the same. We achieve ~4.5x benefit in terms of estimated
multiply-and-accumulate (MAC) reduction while reducing the training complexity
by 50% in our experiments. To further evaluate the energy benefits of our
proposed method, we develop a mixed-precision scalable Process In Memory (PIM)
hardware accelerator platform. The hardware platform incorporates shift-add
functionality for handling multi-bit precision neural network models.
Evaluating the quantized models obtained with our proposed method on the PIM
platform yields ~5x energy reduction compared to 16-bit models. Additionally,
we find that integrating AD based quantization with AD based pruning (both
conducted during training) yields up to ~198x and ~44x energy reductions for
VGG19 and ResNet18 architectures respectively on PIM platform compared to
baseline 16-bit precision, unpruned models.
中文翻译:
节能神经网络的基于激活密度的混合精度量化
随着神经网络在嵌入式设备中得到广泛采用,需要模型压缩技术以促进在资源受限环境中的部署。量化是产生最新模型压缩的常用方法之一。大多数方法采用训练有素的模型,应用不同的启发式方法来确定网络不同层的最佳位精度,并对网络进行重新训练以重新获得准确性的任何下降。基于激活密度(AD)-层中非零激活的比例-我们提出了训练中量化方法。我们的方法计算出训练期间每一层的位宽,从而产生具有竞争准确性的混合精度模型。由于我们在训练过程中训练了精度较低的模型,我们的方法以较低的训练复杂度产生了最终的量化模型,并且消除了重新训练的需要。我们在VGG19 / ResNet18架构上的基准数据集(如CIFAR-10,CIFAR-100,TinyImagenet)上进行了实验,并报告了该数据集的准确性和能量估算。在估计的乘积和(MAC)减少方面,我们实现了约4.5倍的收益,同时在我们的实验中将训练复杂度降低了50%。为了进一步评估我们提出的方法的能源效益,我们开发了一种混合精度可扩展的内存中处理(PIM)硬件加速器平台。硬件平台包含用于处理多位精度神经网络模型的移位加功能。与16位模型相比,在PIM平台上评估使用我们提出的方法获得的量化模型可减少约5倍的能耗。
更新日期:2021-01-13
中文翻译:
节能神经网络的基于激活密度的混合精度量化
随着神经网络在嵌入式设备中得到广泛采用,需要模型压缩技术以促进在资源受限环境中的部署。量化是产生最新模型压缩的常用方法之一。大多数方法采用训练有素的模型,应用不同的启发式方法来确定网络不同层的最佳位精度,并对网络进行重新训练以重新获得准确性的任何下降。基于激活密度(AD)-层中非零激活的比例-我们提出了训练中量化方法。我们的方法计算出训练期间每一层的位宽,从而产生具有竞争准确性的混合精度模型。由于我们在训练过程中训练了精度较低的模型,我们的方法以较低的训练复杂度产生了最终的量化模型,并且消除了重新训练的需要。我们在VGG19 / ResNet18架构上的基准数据集(如CIFAR-10,CIFAR-100,TinyImagenet)上进行了实验,并报告了该数据集的准确性和能量估算。在估计的乘积和(MAC)减少方面,我们实现了约4.5倍的收益,同时在我们的实验中将训练复杂度降低了50%。为了进一步评估我们提出的方法的能源效益,我们开发了一种混合精度可扩展的内存中处理(PIM)硬件加速器平台。硬件平台包含用于处理多位精度神经网络模型的移位加功能。与16位模型相比,在PIM平台上评估使用我们提出的方法获得的量化模型可减少约5倍的能耗。