CURATING: A multi-objective based pruning technique for CNNs

https://doi.org/10.1016/j.sysarc.2021.102031Get rights and content

Abstract

As convolutional neural networks (CNNs) improve in accuracy, their model size and computational overheads have also increased. These overheads make it challenging to deploy the CNNs on resource-constrained devices. Pruning is a promising technique to mitigate these overheads. In this paper, we propose a novel pruning technique called CURATING that looks at the pruning of CNNs as a multi-objective optimization problem. CURATING retains filters that (i) are very different (less redundant) from each other in terms of their representation (ii) have high saliency score i.e., they reduce the model accuracy drastically if pruned (iii) are likely to produce higher activations. We treat a filter specific to an output channel as a probability distribution over spatial filters to measure the similarity between filters. The similarity matrix is leveraged to create filter embeddings, and we constrain our optimization problem to retain a diverse set of filters based on these filter embeddings. On a range of CNNs over well-known datasets, CURATING exercises a better or comparable tradeoff between model size, accuracy, and inference latency than existing techniques. For example, while pruning VGG16 on the ILSVRC-12 dataset, CURATING achieves higher accuracy and a smaller model size than the previous techniques.

Introduction

CNNs have shown phenomenal success in recent years. The power of CNN architectures such as Alexnet [1], InceptionV3 [2] and ResNet [3] lies in their ability to detect complicated structures in images by a series of convolution (CONV) and pooling operations in a hierarchical fashion. These CNN architectures learn a rich representation of filters but are often overdesigned in the pursuit of meeting the accuracy targets. Hence, many of the learned filters are redundant or have an insignificant influence on the final layer activations. Due to their high number of layers and intensive computation demands, these CNNs have large memory and power consumption and high inference latency.

Pruning is a promising technique to achieve a balance between accuracy and hardware overhead. Pruning also acts as a regularizer and, thus, helps the CNN to generalize better. Further, pruning can help in improving the robustness of CNN accelerators to soft-errors and bit-flip attacks [4]. Also, pruned models have lower model size, fewer computations and hence, lower energy consumption than original model. Hence, pruning can enable using deep learning in low-resource scenarios, such as traffic surveillance, battery-operated embedded systems, medical devices, etc. [5]. In this paper, we propose a pruning technique, named CURATING, for reducing the CNN model size without losing accuracy. Our contributions are as follows.

1. We propose CURATING, a multi-objective optimization based pruning technique (Section 3). CURATING retains filter that (i) are less redundant in terms of their filter representation (ii) are likely to produce higher activations (iii) have high saliency score, i.e., pruning these filters reduces the model accuracy drastically. These features make CURATING a robust technique.

2. The previous techniques determine the similarity of two filters based on the vector representation of the weights WRw×h×cin. These techniques assume a specific order for the cin number of spatial filters. However, two 3D filters may be similar even if they have similar filters appearing in a different order. Thus, the previous techniques fail to capture similarity in an order-invariant manner. We propose a novel approach to quantify similarity between the filters in an order-invariant manner. We represent a filter corresponding to an output channel as a probability distribution over spatial filters. For computing the pairwise similarity between filters in a given layer, we use probability divergence measures. This formulation ensures that the similarity of filters specific to output channels is invariant to the order of their spatial filter constituents. We utilize the pairwise similarity matrix for a given layer to create filter embeddings. These embeddings are used to constrain the optimization problem to select a diverse set of filters. As shown in Section 4.2, filter embeddings provides the highest contribution to our multi-objective optimization approach.

3. Unlike in Taylor pruning [6], we compute the saliency score for a filter by measuring the loss increase if the filter weights are set to zero. This is similar to the approach taken in some previous works [7]. Setting the filter weights to zero is closer to pruning the filter than setting the filter activations to zero. The larger the norm of a filter, the greater is its chance to produce a high number of activations. We use the L1 norm of the filters to give preference to filters that are likely to produce a high number of activations.

4. We perform comprehensive experiments on many CNNs (AlexNet, VGG, InceptionV3, ResNet and LeNet5) on ILSVRC-12, MNIST, CatsvsDogs, CIFAR-10 and CIFAR-100 datasets (Section 4). We observe that CURATING exercises a better or comparable tradeoff between model size, accuracy and inference latency than existing techniques [6], [8], [9], [10], [11], [12]. For example, CURATING achieves better compression ratio at comparable or better accuracy in comparison to “DeepCompression” [8] and “Taylor Pruning” [6] for VGG16 on ILSVRC-12 dataset. The ablation studies provide further insights into CURATING’s working and show that CURATING has tunable knobs for achieving a fine-balance between different metrics.

Section snippets

Background and motivation

Notations: A CONV layer l can be parameterized by the filters weights WlRw×h×cin×cout. Here, cout and cin denote the number of output and input channels respectively. Also, w and h denote the width and height of the spatial filters respectively. Each output channel k in a layer l is a collection of cin spatial filters of dimension w×h. The number of activations in an output feature map is represented by Z.

Model pruning through individual weight pruning dates back to Optimal Brain Damage [13]

CURATING: A multi-objective pruning technique

Key idea of CURATING: Previous methods use only one approach for pruning and, thus, forgo the benefits of other approaches. For example, on using only the saliency score, the prior information embedded in the norm-based methods is not utilized. In CURATING, we look at filter pruning as a multi-objective optimization problem. We leverage the information provided by both the filter weights and their saliency for a target dataset. Furthermore, we propose a novel method to create filter embeddings

Implementation and results

We perform experiments on Pytorch using GeForce GTX 1070 GPU with a batch size of 32 [34]. The DeepCompression technique [8] uses pruning, quantization, and Huffman encoding. Of these, we compare against the use of pruning only, since quantization and Huffman encoding are orthogonal ideas and can benefit all pruning techniques.

Conclusion

In this paper, we present a novel pruning technique, named CURATING which optimizes for multiple objectives. CURATING retains filters that have low redundancy, high saliency and produce high activations. Our comprehensive experiments over multiple CNNs over different datasets confirm that CURATING achieves a better tradeoff between accuracy, model-size and latency than existing techniques. Our future work will focus on further design-space exploration and evaluation of our technique on CNNs

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Santanu Pattanayak currently works as a Staff Machine Learning Specialist at Qualcomm Corp R&D and is author of the deep learning book Pro Deep Learning with TensorFlow - A Mathematical Approach to Advanced Artificial Intelligence in Python. He has around 13 years of overall work experience. Prior to joining Qualcomm, Santanu has worked in companies such as GE, RBS, Capgemini, and IBM. He graduated with a degree in electrical engineering from Jadavpur University, Kolkata and is an avid math

References (39)

  • MittalS.

    A survey on modeling and improving reliability of DNN algorithms and accelerators

    J. Syst. Archit.

    (2020)
  • MittalS.

    A survey on optimized implementation of deep learning models on the NVIDIA jetson platform

    J. Syst. Archit.

    (2019)
  • A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: NIPS,...
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with...
  • . He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: CVPR,...
  • P. Rajput, S. Nag, S. Mittal, Detecting usage of mobile phones using deep learning technique, 6th EAI International...
  • P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz, Pruning convolutional neural networks for resource efficient...
  • P. Molchanov, A. Mallya, S. Tyree, I. Frosio, J. Kautz, Importance estimation for neural network pruning, in:...
  • S. Han, H. Mao, W.J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and...
  • H. Li, A. Kadav, I. Durdanovic, H. Samet, H.P. Graf, Pruning filters for efficient convnets, in: International...
  • Y. Li, S. Gu, C. Mayer, L.V. Gool, R. Timofte, Group sparsity: The hinge between filter pruning and decomposition for...
  • Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, C. Zhang, Learning efficient convolutional networks through network slimming,...
  • Y. He, G. Kang, X. Dong, Y. Fu, Y. Yang, Soft filter pruning for accelerating deep convolutional neural networks, in:...
  • LeCunY. et al.

    Optimal brain damage

  • B. Hassibi, D.G. Stork, G.J. Wolff, Optimal brain surgeon and general network pruning, in: IEEE International...
  • MolchanovD. et al.

    Variational dropout sparsifies deep neural networks

  • KingmaD.P. et al.

    Variational dropout and the local reparameterization trick

  • LouizosC. et al.

    Learning sparse neural networks through L_0 regularization

    (2017)
  • ZhuoH. et al.

    Scsp: Spectral clustering filter pruning with soft self-adaption manners

    (2018)
  • Cited by (12)

    • CREW: Computation reuse and efficient weight storage for hardware-accelerated MLPs and RNNs

      2022, Journal of Systems Architecture
      Citation Excerpt :

      Pruning and Sparse Accelerators. Pruning [20,21,46–48] reduces the model size and the number of computations by removing connections/nodes depending on the weights’ values. The pruned model may loss accuracy but tends to regain it after retraining.

    • DNN compression by ADMM-based joint pruning

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Li et al. [34] formulated a sparsity level of each layer under a target compression rate and removed redundant channels heuristically. Some researchers [35–38] considered CNN pruning as optimization problems that reveal the network structure based on a filter selection criteria. Instead of pruning individual weights, these methods compress networks effectively by removing a row or column in the weight matrix.

    • DNN pruning with principal component analysis and connection importance estimation

      2022, Journal of Systems Architecture
      Citation Excerpt :

      DNN Optimizations. Proposals for reducing the memory footprint and/or computations of DNNs include clustering [37], linear quantization [38] and pruning [39–43]. Clustering uses methods such as K-means to reduce the number of different weights to K centroids.

    • A survey on hardware security of DNN models and accelerators

      2021, Journal of Systems Architecture
      Citation Excerpt :

      Thus, with pixel-parallelism, detection of the presence of trojan is more challenging. This is because the pruning of individual weights is easier than the pruning of the filters [72]. The attack success rate with pixel and input-channel parallelism is 92.6% and 70.4%, respectively.

    View all citing articles on Scopus

    Santanu Pattanayak currently works as a Staff Machine Learning Specialist at Qualcomm Corp R&D and is author of the deep learning book Pro Deep Learning with TensorFlow - A Mathematical Approach to Advanced Artificial Intelligence in Python. He has around 13 years of overall work experience. Prior to joining Qualcomm, Santanu has worked in companies such as GE, RBS, Capgemini, and IBM. He graduated with a degree in electrical engineering from Jadavpur University, Kolkata and is an avid math enthusiast. Santanu has completed a master’s degree in data science from Indian Institute of Technology (IIT), Hyderabad. He also devotes his time to data science hackathons and Kaggle competitions where he ranks within the top 500 across the globe.

    Subhrajit Nag is currently pursuing a Ph.D. degree in CSE department at IIT Hyderabad.

    Dr. Sparsh Mittal is currently working as an assistant professor at IIT Roorkee, India. He received the B.Tech. degree from IIT, Roorkee, India and the Ph.D. degree from Iowa State University (ISU), USA. He has worked as a Post-Doctoral Research Associate at Oak Ridge National Lab (ORNL), USA and as an assistant professor at CSE, IIT Hyderabad. He was the graduating topper of his batch in B.Tech and his B.Tech. project received the best project award. He has received a fellowship from ISU and a performance award from ORNL. He has published more than 100 papers at top venues and his research has been covered by technical websites such as InsideHPC, HPCWire, Phys.org, and ScientificComputing. He is an associate editor of Elsevier’s Journal of Systems Architecture. He has given invited talks at ISC Conference at Germany, New York University, University of Michigan and Xilinx (Hyderabad). His research has been funded by Semiconductor Research Corporation (USA), Intel, Redpine Signals and SERB.

    This work is supported by Semiconductor Research Corporation .

    View full text