Elsevier

Neural Networks

Volume 133, January 2021, Pages 112-122
Neural Networks

AutoTune: Automatically Tuning Convolutional Neural Networks for Improved Transfer Learning

https://doi.org/10.1016/j.neunet.2020.10.009Get rights and content

Highlights

  • This paper introduces a framework called AutoTune which finds the optimal number of layers to be fine-tuned automatically for a target dataset for improved transfer learning.

  • Bayesian Optimization technique is applied to learn the pre-trained CNN layers with the knowledge of the target dataset.

  • Several experiments are performed on CalTech-101, CalTech-256, and Stanford Dogs using the proposed AutoTune to establish the efficacy of the proposed model compared to state-of-the-art.

  • In order to validate the robustness of the proposed framework across different CNN architectures, we experiment with three popular CNNs, namely, VGG-16 (Simonyan and Zisserman, 2014), ResNet-50 (He et al., 2016), and DenseNet-121 (Huang et al., 2017).

Abstract

Transfer learning enables solving a specific task having limited data by using the pre-trained deep networks trained on large-scale datasets. Typically, while transferring the learned knowledge from source task to the target task, the last few layers are fine-tuned (re-trained) over the target dataset. However, these layers are originally designed for the source task that might not be suitable for the target task. In this paper, we introduce a mechanism for automatically tuning the Convolutional Neural Networks (CNN) for improved transfer learning. The pre-trained CNN layers are tuned with the knowledge from target data using Bayesian Optimization. First, we train the final layer of the base CNN model by replacing the number of neurons in the softmax layer with the number of classes involved in the target task. Next, the CNN is tuned automatically by observing the classification performance on the validation data (greedy criteria). To evaluate the performance of the proposed method, experiments are conducted on three benchmark datasets, e.g., CalTech-101, CalTech-256, and Stanford Dogs. The classification results obtained through the proposed AutoTune method outperforms the standard baseline transfer learning methods over the three datasets by achieving 95.92%, 86.54%, and 84.67% accuracy over CalTech-101, CalTech-256, and Stanford Dogs, respectively. The experimental results obtained in this study depict that tuning of the pre-trained CNN layers with the knowledge from the target dataset confesses better transfer learning ability. The source codes are available at https://github.com/JekyllAndHyde8999/AutoTune_CNN_TransferLearning.

Introduction

The ability of Convolutional Neural Networks (CNN) to perform feature extraction and decision making in one-shot creates enormous demand in several application areas, such as object recognition (Krizhevsky, Sutskever, & Hinton, 2012), language translation (Zhang, Zong, et al., 2015), and many more. However, the performance of the deep learning models is sensitive w.r.t. the small changes made in both network hyperparameters settings, such as the number of layers, filter dimension of a convolution layer, etc. and it is also sensitive to the other training parameters, such as learning rate, activation function, and so on. Most of the CNNs available in the literature are carefully designed in terms of these hyperparameters by the domain experts (He et al., 2016, Krizhevsky et al., 2012, Simonyan and Zisserman, 2014, Szegedy et al., 2015).

In recent years, researchers have made substantial efforts to automatically learning the structure of a CNN for a specific task (Liu et al., 2018, Zoph et al., 2018), known as Neural Architecture Search (NAS). Although these methods out-perform most of the hand-engineered architectures, the search process requires huge computational resources and time to train the proxy CNNs that are explored during the architecture search process (Zoph & Le, 2016). Particularly while working on small datasets, the process of NAS becomes a challenge. Transfer learning is a popularly adopted technique to reduce the demand for both large computational resources and training data by providing promising performance over small datasets.

Many machine learning algorithms assume that the training (source) data and future data (target) have the similar distribution. In this direction, many Metric-learning algorithms (Dong et al., 2017, Hu et al., 2015) are proposed in the literature. To mention a few, Deep Transfer Metric Learning (DTML) method introduced by Hu et al. (2015), in which a discriminative distance network is trained for cross-dataset visual recognition by maximizing the inter-class variations and minimizing the intra-class similarity. Similarly, Dong et al. (2017) proposed an Ensemble Discriminative Local Metric Learning (EDLML) which aims at learning a sub-space to keep all the intra-class samples as close as possible, while the samples belong to different classes are well separated. Shi, Du, and Zhang (2015) presented a semi-supervised domain adaptation approach which finds new representations of the images belong to the source domain using multiple linear transformations.

Typically, while transforming the learned knowledge from source task to the target task, the classification layer of the pre-trained CNN is dropped, after which a new softmax layer is stacked that is trained over the target dataset during transfer learning. The number of layers to be fine-tuned can be decided majorly based on the size of the target dataset and the similarity between the source and target datasets (Karpathy & Johnson, 2017). However, the pre-trained CNN model is designed for the source dataset, which may not perform well over the target dataset.

In this paper, we attempt to automatically tune the pre-trained CNN to make it suitable for the target task/dataset. To achieve this objective, initially, we drop the softmax layer of the pre-trained CNN by replacing it with a new softmax layer having the neurons equal to the number of classes in the target dataset. Next, we automatically tune the layers of CNN using Bayesian Optimization (Frazier, 2018). It is a well-known fact that the initial layers of CNN represent primitive features such as edges and blobs which are generic to many tasks. On the other hand, the final layers of CNN represent the features that are very specific to the learning task (Zeiler & Fergus, 2014). Based on the above idea, in our work, the layers of the CNN are tuned from right to left (i.e., from final layers to initial layers) by observing the network performance on the validation data. The results obtained through our experiments indicate that tuning the optimum number of layers with respect to the target task leads to better image classification performance in the context of transfer learning. Fig. 1 shows an overview of the proposed idea of improving the transfer learning process. Next, we provide a survey of literature in the specific research area focused in this study.

Section snippets

Related works

The hierarchical feature extraction ability of deep neural networks enables the adoption of a deep network. The pre-trained network over a large-scale dataset (source task) can be utilized to solve the specific task/ problem (target task) through transferring the knowledge learned from the source task. Transfer learning has been widely adopted in domains where collecting the annotated examples is expensive (labor-intensive) and time-consuming task, such as biomedical (Raghu et al., 2020, Shin

Methods

The task of automatically tuning a pre-trained CNN w.r.t. the target dataset can be formulated as a black-box optimization problem where we do not have direct access to the objective function. In this paper, tuning the CNN layers with the knowledge of the target dataset for improved transfer learning is achieved using Bayesian Optimization (Frazier, 2018). Let F be the objective function, which is of the form, F:RdR.

The objective of the Bayesian optimization can be represented mathematically

Architecture tuning search space

Here, we provide a detailed discussion on the used search space for the hyperparameters involved in the different layers of a deep neural network, such as convolution, max-pooling, average pooling, and dense layers. In our experimental settings, we consider the plain as well as skip connection based CNNs. More concretely, we consider the popular CNNs such as VGG-16 (Simonyan & Zisserman, 2014), ResNet-50 (He et al., 2016) and DenseNet-121 (Huang et al., 2017). These networks are originally

Experimental settings

We first demonstrate the details about the hyperparameters such as learning rate, optimizer, etc., employed while training the CNNs in Section 5.1. The pre-trained deep neural networks used to tune the CNN w.r.t. the target dataset are discussed in Section 5.2, and the datasets utilized for conducting the experiments are discussed in Section 5.3.

Results and discussion

To demonstrate the improved results obtained using the proposed method, this section provides a detailed discussion about the classification performances obtained using the proposed AutoTune method. To find better-performing CNNs, experiments are conducted on three benchmark datasets, including CalTech-101, CalTech-256, and Stanford Dogs. To find the target-specific CNN layers for improved transfer learning, three popular CNNs, including VGG-16 (Simonyan & Zisserman, 2014), ResNet-50 (He et

Conclusion and future work

We propose a novel framework for automatically tuning the pre-trained CNN w.r.t. the target dataset while transferring the learned knowledge from the source task to the target task. We compare the Bayesian and Random search strategies to perform the tuning of network hyperparameters. Experiments are conducted using VGG-16, ResNet-50, and DenseNet-121 models over the CalTech-101, CalTech-256, and Stanford Dogs datasets. The models are originally trained over the large-scale ImageNet dataset. The

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

We are grateful to NVIDIA corporation for supporting us by donating an NVIDIA Titan Xp GPU (GPU-900-1G611-2500-000T), which is used for this research.

References (61)

  • BakerB. et al.

    Accelerating neural architecture search using performance prediction

    (2017)
  • BashaS. et al.

    AutoFCL: Automatically tuning fully connected layers for transfer learning

    (2020)
  • BergstraJ.S. et al.

    Algorithms for hyper-parameter optimization

  • Cai, S., Zhang, L., Zuo, W., & Feng, X. (2016). A probabilistic collaborative representation based approach for pattern...
  • ChuB. et al.

    Best practices for fine-tuning visual classifiers to new domains

  • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). AutoAugment: Learning augmentation strategies from...
  • DengJ. et al.

    Imagenet: A large-scale hierarchical image database

  • DongY. et al.

    Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning

    IEEE Transactions on Geoscience and Remote Sensing

    (2017)
  • Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., & Naik, N. (2018). Pairwise confusion for fine-grained visual...
  • DuchiJ. et al.

    Adaptive subgradient methods for online learning and stochastic optimization

    Journal of Machine Learning Research

    (2011)
  • ElskenT. et al.

    Neural architecture search: A survey

    Journal of Machine Learning Research

    (2019)
  • Fei-FeiL. et al.

    Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories

  • FrazierP.I.

    A tutorial on bayesian optimization

    (2018)
  • GriffinG. et al.

    Caltech-256 object category dataset

    (2007)
  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on...
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE...
  • Hu, J., Lu, J., & Tan, Y.-P. (2015). Deep transfer metric learning. In Proceedings of the IEEE conference on computer...
  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In...
  • IoffeS. et al.

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    (2015)
  • KarpathyA. et al.

    CS231n convolutional neural networks for visual recognition-transfer learning

    (2017)
  • Cited by (0)

    View full text