Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor,Journal of Signal Processing Systems

当前位置： X-MOL 学术 › J. Sign. Process. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2021-01-04 , DOI: 10.1007/s11265-020-01616-0
Lucas Fernández Brillet , Nicolas Leclaire , Stéphane Mancini , Marina Nicolas , Sébastien Cleyet-Merle , Jean-Paul Henriques , Claude Delnondedieu

Computational complexity of state of the art Convolutional Neural Networks (CNNs) makes their integration in embedded systems with low power consumption requirements a challenging task. This requires the joint design and adaptation of hardware and algorithms. In this paper, we propose a new general CNN compression method to reduce both the number of parameters and operations. To solve this, we introduce a new Principal Component Analysis (PCA) based compression, which relies on an optimal transformation (in the mean squared error sense) of the filters on each layer into a new representation space where convolutions are then applied. Compression is achieved by dimensioning this new representation space, with an arbitrarily controlled accuracy degradation of the new CNN. PCA compression is evaluated on multiple networks and datasets from the state of the art and applied to a binary face classification network. To show the versatility of the method and its usefulness to adapt a CNN to a hardware computing system, the compressed face classification network is implemented and evaluated on a custom embedded multiprocessor. Results show that for example, an overall compression rates of 2x can be achieved on a compact ResNet-32 model on the CIFAR-10 dataset, with only a negligible loss of 2% of the network accuracy, while up to 11x compression rates can be achieved on specific layers with negligible accuracy loss.

中文翻译：

通过降维减少卷积神经网络的压缩和速度，从而在嵌入式多处理器上进行有效推理

先进的卷积神经网络（CNN）的计算复杂性使其在具有低功耗要求的嵌入式系统中的集成成为一项艰巨的任务。这需要联合设计以及对硬件和算法的调整。在本文中，我们提出了一种新的通用CNN压缩方法，以减少参数数量和运算量。为了解决这个问题，我们引入了一种新的基于主成分分析（PCA）的压缩方法，该方法依靠将每层滤波器的最佳变换（在均方误差意义上）转换为新的表示空间，然后进行卷积。通过确定此新表示空间的大小来实现压缩，同时新CNN的精度会任意降低。PCA压缩是在来自现有技术的多个网络和数据集上进行评估的，并应用于二进制人脸分类网络。为了展示该方法的多功能性及其将CNN适配到硬件计算系统的有用性，在定制的嵌入式多处理器上实现并评估了压缩的面部分类网络。结果表明，例如，在CIFAR-10数据集上的紧凑型ResNet-32模型上，总压缩率可以达到2倍，而网络精度损失只有2％可以忽略不计，而最高压缩率却可以达到11倍。在特定层上实现的精度损失可忽略不计。压缩人脸分类网络是在定制的嵌入式多处理器上实现和评估的。结果表明，例如，在CIFAR-10数据集上的紧凑型ResNet-32模型上，总压缩率可以达到2倍，而网络精度损失只有2％可以忽略不计，而最高压缩率却可以达到11倍。在特定层上实现的精度损失可忽略不计。压缩人脸分类网络是在定制的嵌入式多处理器上实现和评估的。结果表明，例如，在CIFAR-10数据集上的紧凑型ResNet-32模型上，总压缩率可以达到2倍，而网络精度损失只有2％可以忽略不计，而最高压缩率却可以达到11倍。在特定层上实现的精度损失可忽略不计。

更新日期：2021-01-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文