Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence,IEEE Open Journal of Circuits and Systems

当前位置： X-MOL 学术 › IEEE Open J. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence
IEEE Open Journal of Circuits and Systems ( IF 2.4 ) Pub Date : 2021-05-13 , DOI: 10.1109/ojcas.2021.3072884
Robert A. Cohen , Hyomin Choi , Ivan V. Bajic

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.

中文翻译：

用于协作智能的中间神经网络特征的轻量级压缩

在协作智能应用中，深度神经网络 (DNN) 的一部分部署在手机或边缘设备等轻量级设备上，而 DNN 的剩余部分在可用计算资源较多的地方（例如在云端）进行处理。本文提出了一种新颖的轻量级压缩技术，专门用于量化和压缩分割 DNN 中间层输出的特征，而不需要对网络权重进行任何重新训练。用于估计中间层的leaky-ReLU和ReLU激活的限幅和量化误差的数学模型用于计算粗量化的最佳限幅范围。开发了一个数学模型，用于估计中间层的leaky-ReLU激活的裁剪和量化误差，并用于计算粗量化的最佳裁剪范围。我们还提出了一种改进的熵约束设计算法，用于量化剪切激活。当应用于流行的对象检测和分类 DNN 时，我们能够将 32 位浮点中间激活压缩到 0.6 到 0.8 位，同时将精度损失保持在 1% 以下。与 HEVC 相比，我们发现轻量级编解码器始终提供更好的推理精度，最高提高 1.3%。这种轻量级压缩技术的性能和简单性使其成为边缘/云应用程序的分割神经网络中间层编码的有吸引力的选择。

更新日期：2021-05-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文