AutoTune: Automatically Tuning Convolutional Neural Networks for Improved Transfer Learning
Introduction
The ability of Convolutional Neural Networks (CNN) to perform feature extraction and decision making in one-shot creates enormous demand in several application areas, such as object recognition (Krizhevsky, Sutskever, & Hinton, 2012), language translation (Zhang, Zong, et al., 2015), and many more. However, the performance of the deep learning models is sensitive w.r.t. the small changes made in both network hyperparameters settings, such as the number of layers, filter dimension of a convolution layer, etc. and it is also sensitive to the other training parameters, such as learning rate, activation function, and so on. Most of the CNNs available in the literature are carefully designed in terms of these hyperparameters by the domain experts (He et al., 2016, Krizhevsky et al., 2012, Simonyan and Zisserman, 2014, Szegedy et al., 2015).
In recent years, researchers have made substantial efforts to automatically learning the structure of a CNN for a specific task (Liu et al., 2018, Zoph et al., 2018), known as Neural Architecture Search (NAS). Although these methods out-perform most of the hand-engineered architectures, the search process requires huge computational resources and time to train the proxy CNNs that are explored during the architecture search process (Zoph & Le, 2016). Particularly while working on small datasets, the process of NAS becomes a challenge. Transfer learning is a popularly adopted technique to reduce the demand for both large computational resources and training data by providing promising performance over small datasets.
Many machine learning algorithms assume that the training (source) data and future data (target) have the similar distribution. In this direction, many Metric-learning algorithms (Dong et al., 2017, Hu et al., 2015) are proposed in the literature. To mention a few, Deep Transfer Metric Learning (DTML) method introduced by Hu et al. (2015), in which a discriminative distance network is trained for cross-dataset visual recognition by maximizing the inter-class variations and minimizing the intra-class similarity. Similarly, Dong et al. (2017) proposed an Ensemble Discriminative Local Metric Learning (EDLML) which aims at learning a sub-space to keep all the intra-class samples as close as possible, while the samples belong to different classes are well separated. Shi, Du, and Zhang (2015) presented a semi-supervised domain adaptation approach which finds new representations of the images belong to the source domain using multiple linear transformations.
Typically, while transforming the learned knowledge from source task to the target task, the classification layer of the pre-trained CNN is dropped, after which a new softmax layer is stacked that is trained over the target dataset during transfer learning. The number of layers to be fine-tuned can be decided majorly based on the size of the target dataset and the similarity between the source and target datasets (Karpathy & Johnson, 2017). However, the pre-trained CNN model is designed for the source dataset, which may not perform well over the target dataset.
In this paper, we attempt to automatically tune the pre-trained CNN to make it suitable for the target task/dataset. To achieve this objective, initially, we drop the softmax layer of the pre-trained CNN by replacing it with a new softmax layer having the neurons equal to the number of classes in the target dataset. Next, we automatically tune the layers of CNN using Bayesian Optimization (Frazier, 2018). It is a well-known fact that the initial layers of CNN represent primitive features such as edges and blobs which are generic to many tasks. On the other hand, the final layers of CNN represent the features that are very specific to the learning task (Zeiler & Fergus, 2014). Based on the above idea, in our work, the layers of the CNN are tuned from right to left (i.e., from final layers to initial layers) by observing the network performance on the validation data. The results obtained through our experiments indicate that tuning the optimum number of layers with respect to the target task leads to better image classification performance in the context of transfer learning. Fig. 1 shows an overview of the proposed idea of improving the transfer learning process. Next, we provide a survey of literature in the specific research area focused in this study.
Section snippets
Related works
The hierarchical feature extraction ability of deep neural networks enables the adoption of a deep network. The pre-trained network over a large-scale dataset (source task) can be utilized to solve the specific task/ problem (target task) through transferring the knowledge learned from the source task. Transfer learning has been widely adopted in domains where collecting the annotated examples is expensive (labor-intensive) and time-consuming task, such as biomedical (Raghu et al., 2020, Shin
Methods
The task of automatically tuning a pre-trained CNN w.r.t. the target dataset can be formulated as a black-box optimization problem where we do not have direct access to the objective function. In this paper, tuning the CNN layers with the knowledge of the target dataset for improved transfer learning is achieved using Bayesian Optimization (Frazier, 2018). Let be the objective function, which is of the form,
The objective of the Bayesian optimization can be represented mathematically
Architecture tuning search space
Here, we provide a detailed discussion on the used search space for the hyperparameters involved in the different layers of a deep neural network, such as convolution, max-pooling, average pooling, and dense layers. In our experimental settings, we consider the plain as well as skip connection based CNNs. More concretely, we consider the popular CNNs such as VGG-16 (Simonyan & Zisserman, 2014), ResNet-50 (He et al., 2016) and DenseNet-121 (Huang et al., 2017). These networks are originally
Experimental settings
We first demonstrate the details about the hyperparameters such as learning rate, optimizer, etc., employed while training the CNNs in Section 5.1. The pre-trained deep neural networks used to tune the CNN w.r.t. the target dataset are discussed in Section 5.2, and the datasets utilized for conducting the experiments are discussed in Section 5.3.
Results and discussion
To demonstrate the improved results obtained using the proposed method, this section provides a detailed discussion about the classification performances obtained using the proposed AutoTune method. To find better-performing CNNs, experiments are conducted on three benchmark datasets, including CalTech-101, CalTech-256, and Stanford Dogs. To find the target-specific CNN layers for improved transfer learning, three popular CNNs, including VGG-16 (Simonyan & Zisserman, 2014), ResNet-50 (He et
Conclusion and future work
We propose a novel framework for automatically tuning the pre-trained CNN w.r.t. the target dataset while transferring the learned knowledge from the source task to the target task. We compare the Bayesian and Random search strategies to perform the tuning of network hyperparameters. Experiments are conducted using VGG-16, ResNet-50, and DenseNet-121 models over the CalTech-101, CalTech-256, and Stanford Dogs datasets. The models are originally trained over the large-scale ImageNet dataset. The
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
We are grateful to NVIDIA corporation for supporting us by donating an NVIDIA Titan Xp GPU (GPU-900-1G611-2500-000T), which is used for this research.
References (61)
- et al.
Redundant feature pruning for accelerated inference in deep neural networks
Neural Networks
(2019) - et al.
A new image classification method using CNN transfer learning and web data augmentation
Expert Systems with Applications
(2018) - et al.
Efficient network architecture search via multiobjective particle swarm optimization based on decomposition
Neural Networks
(2020) - et al.
Deep learning in agriculture: A survey
Computers and Electronics in Agriculture
(2018) - et al.
A novel deep learning based framework for the detection and classification of breast cancer using transfer learning
Pattern Recognition Letters
(2019) - et al.
A baseline regularization scheme for transfer learning with convolutional neural networks
Pattern Recognition
(2020) - et al.
Content-based image retrieval using computational visual attention model
Pattern Recognition
(2015) - et al.
Top-down saliency detection driven by visual classification
Computer Vision and Image Understanding
(2018) - et al.
Factors of transferability for a generic convnet representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2015) - et al.
Designing neural network architectures using reinforcement learning
(2016)