MARViN -- Multiple Arithmetic Resolutions Vacillating in Neural Networks,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MARViN -- Multiple Arithmetic Resolutions Vacillating in Neural Networks
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-28 , DOI: arxiv-2107.13490
Lorenz Kummer, Kevin Sidak, Tabea Reichmann, Wilfried Gansterer

Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or time critical inference applications. State-of-the-art (SOTA) quantization approaches focus on post-training quantization, i.e. quantization of pre-trained DNNs for speeding up inference. Very little work on quantized training exists, which neither al-lows dynamic intra-epoch precision switches nor em-ploys an information theory based switching heuristic. Usually, existing approaches require full precision refinement afterwards and enforce a global word length across the whole DNN. This leads to suboptimal quantization mappings and resource usage. Recognizing these limits, we introduce MARViN, a new quantized training strategy using information theory-based intra-epoch precision switching, which decides on a per-layer basis which precision should be used in order to minimize quantization-induced information loss. Note that any quantization must leave enough precision such that future learning steps do not suffer from vanishing gradients. We achieve an average speedup of 1.86 compared to a float32 basis while limiting mean accuracy degradation on AlexNet/ResNet to only -0.075%.

中文翻译：

MARViN——神经网络中波动的多个算术分辨率

量化是一种减少深度神经网络 (DNN) 训练和推理时间的技术，这对于在资源受限环境或时间关键推理应用程序中进行训练至关重要。最先进的 (SOTA) 量化方法侧重于训练后量化，即预训练 DNN 的量化以加速推理。很少有关于量化训练的工作，它既不允许动态时期内精确切换，也不允许采用基于信息理论的切换启发式。通常，现有的方法需要事后进行全面的精度改进，并在整个 DNN 中强制使用全局字长。这导致次优的量化映射和资源使用。认识到这些限制，我们引入了 MARViN，一种新的量化训练策略，它使用基于信息理论的内时代精度切换，在每层的基础上决定应该使用哪种精度，以最大限度地减少量化引起的信息损失。请注意，任何量化都必须保持足够的精度，以便未来的学习步骤不会受到梯度消失的影响。与基于 float32 的基础相比，我们实现了 1.86 的平均加速，同时将 AlexNet/ResNet 上的平均精度下降限制在 -0.075%。

更新日期：2021-07-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文