Skip to main content
Log in

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

With the rapid development of neural architecture search (NAS), researchers found powerful network architectures for a wide range of vision tasks. Like the manually designed counterparts, we desire the automatically searched architectures to have the ability of being freely transferred to different scenarios. This paper formally puts forward this problem, referred to as NAS in the wild, which explores the possibility of finding the optimal architecture in a proxy dataset and then deploying it to mostly unseen scenarios. We instantiate this setting using a currently popular algorithm named differentiable architecture search (DARTS), which often suffers unsatisfying performance while being transferred across different tasks. We argue that the accuracy drop originates from the formulation that uses a super-network for search but a sub-network for re-training. The different properties of these stages have resulted in a significant optimization gap, and consequently, the architectural parameters “over-fit” the super-network. To alleviate the gap, we present a progressive method that gradually increases the network depth during the search stage, which leads to the Progressive DARTS (P-DARTS) algorithm. With a reduced search cost (7 hours on a single GPU), P-DARTS achieves improved performance on both the proxy dataset (CIFAR10) and a few target problems (ImageNet classification, COCO detection and three ReID benchmarks). Our code is available at https://github.com/chenxin061/pdarts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We also tried to start with architectural parameters learned from the previous stage, \({\mathfrak {S}}_{k-1}\), and adjust them according to Eq. 1 to ensure that the weights of preserved operations should still sum to one. This strategy reported slightly lower accuracy. Actually, we find that only an average of 5.3 (out of 14 normal edges) most significant operations in \({\mathfrak {S}}_1\) continue to have the largest weight in \({\mathfrak {S}}_2\), and the number is only slightly increased to 6.7 from \({\mathfrak {S}}_2\) to \({\mathfrak {S}}_3\) – this is to say, deeper architectures may have altered preferences.

  2. Here, we do not change the batch size to fit into the GPU memory because even under a fixed batch size, the usage of GPU memory can vary since the set of preserved candidates can differ, for example, a convolutional operator occupies more memory than a pooling operator. This is why we need to discuss the stability of GPU memory usage.

  3. The mean test error of these three trials is \(3.61\%\pm 0.21\%\) (the corresponding errors are \(3.43\%\), \(3.51\%\) and \(3.89\%\), respectively).

  4. Individually, swish activation function reduced the top-1 test error of NASNet-A from \(26.4\%\) to \(25.0\%\)(Ramachandran et al. 2017), SE module brought an performance gain of \(0.7\%\) (from \(25.5\%\) to \(24.8\%\)) on MnasNet (Tan et al. 2019), and AutoAugment achieved an accuracy gain of \(1.3\%\) on ResNet-50 (Cubuk et al. 2018). With swish activation function, SE module and AutoAugment, the compound gain is \(2.5\%\) (from \(25.2\%\) of MnasNet-92 to \(22.7\%\) of EfficientNet-B0. )

References

  • Baker, B., Gupta, O., Naik, N., & Raskar, R. (2017). Designing neural network architectures using reinforcement learning. In ICLR.

  • Bi, K., Hu, C., Xie, L., Chen, X., Wei, L., & Tian, Q. (2019). Stabilizing darts with amended gradient estimation on architectural parameters. arXiv:1910.11831.

  • Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018). Efficient architecture search by network transformation. In AAAI.

  • Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR.

  • Chen, X., Xie, L., Wu, J., & Tian, Q. (2019a). Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV.

  • Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., & Sun, J. (2019b). Detnas: Backbone search for object detection. In NeurIPS.

  • Chu, X., Zhang, B., Xu, R., & Li, J. (2019). Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv:1907.01845.

  • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv:1805.09501.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.

  • DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552.

  • Dong, X., & Yang, Y. (2019a). One-shot neural architecture search via self-evaluated template network. In ICCV.

  • Dong, X., & Yang, Y. (2019b). Searching for a robust neural architecture in four gpu hours. In CVPR.

  • Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In ICCV.

  • Elsken, T., Metzen, J. H., & Hutter, F. (2018). Neural architecture search: A survey. arXiv:1808.05377.

  • Ghiasi, G., Lin, T. Y., & Le, Q. V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In CVPR.

  • Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677.

  • Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In CVPR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al. (2019) Searching for mobilenetv3. In ICCV.

  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.

  • Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In ECCV, Springer.

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.

  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., Citeseer.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In NIPS.

  • Larsson, G., Maire, M., & Shakhnarovich, G. (2017). FractalNet: Ultra-deep neural networks without residuals. In ICLR.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Li, J., Ma, A. J., & Yuen, P. C. (2018). Semi-supervised region metric learning for person re-identification. IJCV, 126(8), 855–874.

    Article  Google Scholar 

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In ECCV.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR.

  • Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. (2018a). Progressive neural architecture search. In ECCV.

  • Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2018b). Hierarchical representations for efficient architecture search. In ICLR.

  • Liu, H., Simonyan, K., & Yang, Y. (2019a). DARTS: Differentiable architecture search. In ICLR.

  • Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2019b). Deep learning for generic object detection: A survey. In IJCV.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ECCV.

  • Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In ECCV.

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.

  • Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In ICML.

  • Quan, R., Dong, X., Wu, Y., Zhu, L., & Yang, Y. (2019). Auto-reid: Searching for a part-aware convnet for person re-identification. In ICCV.

  • Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv:1710.05941.

  • Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2018). Regularized evolution for image classifier architecture search. arXiv:1802.01548.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. IJCV, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.

  • Shu, Y., Wang, W., & Cai, S. (2020). Understanding architectures learnt by cell-based neural architecture search. In ICLR.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In NIPS.

  • Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2), 99–127.

    Article  Google Scholar 

  • Suganuma, M., Shirakawa, S., & Nagao, T. (2017). A genetic programming approach to designing convolutional neural network architectures. In GECCO.

  • Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In CVPR.

  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML.

  • Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR.

  • Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In CVPR.

  • Tian, Z., Shen, C., Chen, H., & He, T. (2020). Fcos: A simple and strong anchor-free object detector. arXiv:2006.09214.

  • Wang, H., Zhu, X., Gong, S., & Xiang, T. (2018). Person re-identification in identity regression space. IJCV, 126(12), 1288–1310.

    Article  Google Scholar 

  • Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR.

  • Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR.

  • Xie, L., & Yuille, A. (2017). Genetic CNN. In ICCV.

  • Xie, S., Kirillov, A., Girshick, R., & He, K. (2019a). Exploring randomly wired neural networks for image recognition. In ICCV.

  • Xie, S., Zheng, H., Liu, C., & Lin, L. (2019b). SNAS: Stochastic neural architecture search. In ICLR.

  • Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G. J., Tian, Q., & Xiong, H. (2020). PC-DARTS: Partial channel connections for memory-efficient architecture search. In ICLR.

  • Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv:1605.07146.

  • Zela, A., Elsken, T., Saikia, T., Marrakchi, Y., Brox, T., & Hutter, F. (2020). Understanding and robustifying differentiable architecture search. In ICLR.

  • Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR.

  • Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV.

  • Zheng, X., Ji, R., Tang, L., Zhang, B., Liu, J., & Tian, Q. (2019). Multinomial distribution learning for effective neural architecture search. In ICCV.

  • Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.

  • Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017) Random erasing data augmentation. arXiv:1708.04896.

  • Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. In ICLR.

  • Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In CVPR.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61831018, and 61631017, and Guangdong Province Key Research and Development Program Major Science and Technology Projects under Grant 2018B010115002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wu.

Additional information

Communicated by Mei Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Xie, L., Wu, J. et al. Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild. Int J Comput Vis 129, 638–655 (2021). https://doi.org/10.1007/s11263-020-01396-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01396-x

Keywords

Navigation