CURATING: A multi-objective based pruning technique for CNNs☆
Introduction
CNNs have shown phenomenal success in recent years. The power of CNN architectures such as Alexnet [1], InceptionV3 [2] and ResNet [3] lies in their ability to detect complicated structures in images by a series of convolution (CONV) and pooling operations in a hierarchical fashion. These CNN architectures learn a rich representation of filters but are often overdesigned in the pursuit of meeting the accuracy targets. Hence, many of the learned filters are redundant or have an insignificant influence on the final layer activations. Due to their high number of layers and intensive computation demands, these CNNs have large memory and power consumption and high inference latency.
Pruning is a promising technique to achieve a balance between accuracy and hardware overhead. Pruning also acts as a regularizer and, thus, helps the CNN to generalize better. Further, pruning can help in improving the robustness of CNN accelerators to soft-errors and bit-flip attacks [4]. Also, pruned models have lower model size, fewer computations and hence, lower energy consumption than original model. Hence, pruning can enable using deep learning in low-resource scenarios, such as traffic surveillance, battery-operated embedded systems, medical devices, etc. [5]. In this paper, we propose a pruning technique, named CURATING, for reducing the CNN model size without losing accuracy. Our contributions are as follows.
1. We propose CURATING, a multi-objective optimization based pruning technique (Section 3). CURATING retains filter that (i) are less redundant in terms of their filter representation (ii) are likely to produce higher activations (iii) have high saliency score, i.e., pruning these filters reduces the model accuracy drastically. These features make CURATING a robust technique.
2. The previous techniques determine the similarity of two filters based on the vector representation of the weights . These techniques assume a specific order for the number of spatial filters. However, two D filters may be similar even if they have similar filters appearing in a different order. Thus, the previous techniques fail to capture similarity in an order-invariant manner. We propose a novel approach to quantify similarity between the filters in an order-invariant manner. We represent a filter corresponding to an output channel as a probability distribution over spatial filters. For computing the pairwise similarity between filters in a given layer, we use probability divergence measures. This formulation ensures that the similarity of filters specific to output channels is invariant to the order of their spatial filter constituents. We utilize the pairwise similarity matrix for a given layer to create filter embeddings. These embeddings are used to constrain the optimization problem to select a diverse set of filters. As shown in Section 4.2, filter embeddings provides the highest contribution to our multi-objective optimization approach.
3. Unlike in Taylor pruning [6], we compute the saliency score for a filter by measuring the loss increase if the filter weights are set to zero. This is similar to the approach taken in some previous works [7]. Setting the filter weights to zero is closer to pruning the filter than setting the filter activations to zero. The larger the norm of a filter, the greater is its chance to produce a high number of activations. We use the norm of the filters to give preference to filters that are likely to produce a high number of activations.
4. We perform comprehensive experiments on many CNNs (AlexNet, VGG, InceptionV3, ResNet and LeNet5) on ILSVRC-12, MNIST, CatsvsDogs, CIFAR-10 and CIFAR-100 datasets (Section 4). We observe that CURATING exercises a better or comparable tradeoff between model size, accuracy and inference latency than existing techniques [6], [8], [9], [10], [11], [12]. For example, CURATING achieves better compression ratio at comparable or better accuracy in comparison to “DeepCompression” [8] and “Taylor Pruning” [6] for VGG16 on ILSVRC-12 dataset. The ablation studies provide further insights into CURATING’s working and show that CURATING has tunable knobs for achieving a fine-balance between different metrics.
Section snippets
Background and motivation
Notations: A CONV layer can be parameterized by the filters weights . Here, and denote the number of output and input channels respectively. Also, and denote the width and height of the spatial filters respectively. Each output channel in a layer is a collection of spatial filters of dimension . The number of activations in an output feature map is represented by .
Model pruning through individual weight pruning dates back to Optimal Brain Damage [13]
CURATING: A multi-objective pruning technique
Key idea of CURATING: Previous methods use only one approach for pruning and, thus, forgo the benefits of other approaches. For example, on using only the saliency score, the prior information embedded in the norm-based methods is not utilized. In CURATING, we look at filter pruning as a multi-objective optimization problem. We leverage the information provided by both the filter weights and their saliency for a target dataset. Furthermore, we propose a novel method to create filter embeddings
Implementation and results
We perform experiments on Pytorch using GeForce GTX 1070 GPU with a batch size of 32 [34]. The DeepCompression technique [8] uses pruning, quantization, and Huffman encoding. Of these, we compare against the use of pruning only, since quantization and Huffman encoding are orthogonal ideas and can benefit all pruning techniques.
Conclusion
In this paper, we present a novel pruning technique, named CURATING which optimizes for multiple objectives. CURATING retains filters that have low redundancy, high saliency and produce high activations. Our comprehensive experiments over multiple CNNs over different datasets confirm that CURATING achieves a better tradeoff between accuracy, model-size and latency than existing techniques. Our future work will focus on further design-space exploration and evaluation of our technique on CNNs
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Santanu Pattanayak currently works as a Staff Machine Learning Specialist at Qualcomm Corp R&D and is author of the deep learning book Pro Deep Learning with TensorFlow - A Mathematical Approach to Advanced Artificial Intelligence in Python. He has around 13 years of overall work experience. Prior to joining Qualcomm, Santanu has worked in companies such as GE, RBS, Capgemini, and IBM. He graduated with a degree in electrical engineering from Jadavpur University, Kolkata and is an avid math
References (39)
A survey on modeling and improving reliability of DNN algorithms and accelerators
J. Syst. Archit.
(2020)A survey on optimized implementation of deep learning models on the NVIDIA jetson platform
J. Syst. Archit.
(2019)- A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: NIPS,...
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with...
- . He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: CVPR,...
- P. Rajput, S. Nag, S. Mittal, Detecting usage of mobile phones using deep learning technique, 6th EAI International...
- P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz, Pruning convolutional neural networks for resource efficient...
- P. Molchanov, A. Mallya, S. Tyree, I. Frosio, J. Kautz, Importance estimation for neural network pruning, in:...
- S. Han, H. Mao, W.J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and...
- H. Li, A. Kadav, I. Durdanovic, H. Samet, H.P. Graf, Pruning filters for efficient convnets, in: International...
Optimal brain damage
Variational dropout sparsifies deep neural networks
Variational dropout and the local reparameterization trick
Learning sparse neural networks through regularization
Scsp: Spectral clustering filter pruning with soft self-adaption manners
Cited by (12)
CREW: Computation reuse and efficient weight storage for hardware-accelerated MLPs and RNNs
2022, Journal of Systems ArchitectureCitation Excerpt :Pruning and Sparse Accelerators. Pruning [20,21,46–48] reduces the model size and the number of computations by removing connections/nodes depending on the weights’ values. The pruned model may loss accuracy but tends to regain it after retraining.
DNN compression by ADMM-based joint pruning
2022, Knowledge-Based SystemsCitation Excerpt :Li et al. [34] formulated a sparsity level of each layer under a target compression rate and removed redundant channels heuristically. Some researchers [35–38] considered CNN pruning as optimization problems that reveal the network structure based on a filter selection criteria. Instead of pruning individual weights, these methods compress networks effectively by removing a row or column in the weight matrix.
DNN pruning with principal component analysis and connection importance estimation
2022, Journal of Systems ArchitectureCitation Excerpt :DNN Optimizations. Proposals for reducing the memory footprint and/or computations of DNNs include clustering [37], linear quantization [38] and pruning [39–43]. Clustering uses methods such as K-means to reduce the number of different weights to K centroids.
A survey of hardware architectures for generative adversarial networks
2021, Journal of Systems ArchitectureA survey on hardware security of DNN models and accelerators
2021, Journal of Systems ArchitectureCitation Excerpt :Thus, with pixel-parallelism, detection of the presence of trojan is more challenging. This is because the pruning of individual weights is easier than the pruning of the filters [72]. The attack success rate with pixel and input-channel parallelism is 92.6% and 70.4%, respectively.
Santanu Pattanayak currently works as a Staff Machine Learning Specialist at Qualcomm Corp R&D and is author of the deep learning book Pro Deep Learning with TensorFlow - A Mathematical Approach to Advanced Artificial Intelligence in Python. He has around 13 years of overall work experience. Prior to joining Qualcomm, Santanu has worked in companies such as GE, RBS, Capgemini, and IBM. He graduated with a degree in electrical engineering from Jadavpur University, Kolkata and is an avid math enthusiast. Santanu has completed a master’s degree in data science from Indian Institute of Technology (IIT), Hyderabad. He also devotes his time to data science hackathons and Kaggle competitions where he ranks within the top 500 across the globe.
Subhrajit Nag is currently pursuing a Ph.D. degree in CSE department at IIT Hyderabad.
Dr. Sparsh Mittal is currently working as an assistant professor at IIT Roorkee, India. He received the B.Tech. degree from IIT, Roorkee, India and the Ph.D. degree from Iowa State University (ISU), USA. He has worked as a Post-Doctoral Research Associate at Oak Ridge National Lab (ORNL), USA and as an assistant professor at CSE, IIT Hyderabad. He was the graduating topper of his batch in B.Tech and his B.Tech. project received the best project award. He has received a fellowship from ISU and a performance award from ORNL. He has published more than 100 papers at top venues and his research has been covered by technical websites such as InsideHPC, HPCWire, Phys.org, and ScientificComputing. He is an associate editor of Elsevier’s Journal of Systems Architecture. He has given invited talks at ISC Conference at Germany, New York University, University of Michigan and Xilinx (Hyderabad). His research has been funded by Semiconductor Research Corporation (USA), Intel, Redpine Signals and SERB.
- ☆
This work is supported by Semiconductor Research Corporation .