Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

Chen, Xin; Xie, Lingxi; Wu, Jun; Tian, Qi

doi:10.1007/s11263-020-01396-x

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

Published: 03 November 2020

Volume 129, pages 638–655, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xin Chen¹,
Lingxi Xie²,
Jun Wu^3,4 &
…
Qi Tian²

1339 Accesses
32 Citations
Explore all metrics

Abstract

With the rapid development of neural architecture search (NAS), researchers found powerful network architectures for a wide range of vision tasks. Like the manually designed counterparts, we desire the automatically searched architectures to have the ability of being freely transferred to different scenarios. This paper formally puts forward this problem, referred to as NAS in the wild, which explores the possibility of finding the optimal architecture in a proxy dataset and then deploying it to mostly unseen scenarios. We instantiate this setting using a currently popular algorithm named differentiable architecture search (DARTS), which often suffers unsatisfying performance while being transferred across different tasks. We argue that the accuracy drop originates from the formulation that uses a super-network for search but a sub-network for re-training. The different properties of these stages have resulted in a significant optimization gap, and consequently, the architectural parameters “over-fit” the super-network. To alleviate the gap, we present a progressive method that gradually increases the network depth during the search stage, which leads to the Progressive DARTS (P-DARTS) algorithm. With a reduced search cost (7 hours on a single GPU), P-DARTS achieves improved performance on both the proxy dataset (CIFAR10) and a few target problems (ImageNet classification, COCO detection and three ReID benchmarks). Our code is available at https://github.com/chenxin061/pdarts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning

Article 02 February 2023

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Article 06 August 2021

Notes

We also tried to start with architectural parameters learned from the previous stage, \({\mathfrak {S}}_{k-1}\), and adjust them according to Eq. 1 to ensure that the weights of preserved operations should still sum to one. This strategy reported slightly lower accuracy. Actually, we find that only an average of 5.3 (out of 14 normal edges) most significant operations in \({\mathfrak {S}}_1\) continue to have the largest weight in \({\mathfrak {S}}_2\), and the number is only slightly increased to 6.7 from \({\mathfrak {S}}_2\) to \({\mathfrak {S}}_3\) – this is to say, deeper architectures may have altered preferences.
Here, we do not change the batch size to fit into the GPU memory because even under a fixed batch size, the usage of GPU memory can vary since the set of preserved candidates can differ, for example, a convolutional operator occupies more memory than a pooling operator. This is why we need to discuss the stability of GPU memory usage.
The mean test error of these three trials is \(3.61\%\pm 0.21\%\) (the corresponding errors are \(3.43\%\), \(3.51\%\) and \(3.89\%\), respectively).
Individually, swish activation function reduced the top-1 test error of NASNet-A from \(26.4\%\) to \(25.0\%\)(Ramachandran et al. 2017), SE module brought an performance gain of \(0.7\%\) (from \(25.5\%\) to \(24.8\%\)) on MnasNet (Tan et al. 2019), and AutoAugment achieved an accuracy gain of \(1.3\%\) on ResNet-50 (Cubuk et al. 2018). With swish activation function, SE module and AutoAugment, the compound gain is \(2.5\%\) (from \(25.2\%\) of MnasNet-92 to \(22.7\%\) of EfficientNet-B0. )

References

Baker, B., Gupta, O., Naik, N., & Raskar, R. (2017). Designing neural network architectures using reinforcement learning. In ICLR.
Bi, K., Hu, C., Xie, L., Chen, X., Wei, L., & Tian, Q. (2019). Stabilizing darts with amended gradient estimation on architectural parameters. arXiv:1910.11831.
Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018). Efficient architecture search by network transformation. In AAAI.
Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR.
Chen, X., Xie, L., Wu, J., & Tian, Q. (2019a). Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV.
Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., & Sun, J. (2019b). Detnas: Backbone search for object detection. In NeurIPS.
Chu, X., Zhang, B., Xu, R., & Li, J. (2019). Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv:1907.01845.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv:1805.09501.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552.
Dong, X., & Yang, Y. (2019a). One-shot neural architecture search via self-evaluated template network. In ICCV.
Dong, X., & Yang, Y. (2019b). Searching for a robust neural architecture in four gpu hours. In CVPR.
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In ICCV.
Elsken, T., Metzen, J. H., & Hutter, F. (2018). Neural architecture search: A survey. arXiv:1808.05377.
Ghiasi, G., Lin, T. Y., & Le, Q. V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In CVPR.
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677.
Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In CVPR.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al. (2019) Searching for mobilenetv3. In ICCV.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.
Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In ECCV, Springer.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., Citeseer.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In NIPS.
Larsson, G., Maire, M., & Shakhnarovich, G. (2017). FractalNet: Ultra-deep neural networks without residuals. In ICLR.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Li, J., Ma, A. J., & Yuen, P. C. (2018). Semi-supervised region metric learning for person re-identification. IJCV, 126(8), 855–874.
Article Google Scholar
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In ECCV.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR.
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. (2018a). Progressive neural architecture search. In ECCV.
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2018b). Hierarchical representations for efficient architecture search. In ICLR.
Liu, H., Simonyan, K., & Yang, Y. (2019a). DARTS: Differentiable architecture search. In ICLR.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2019b). Deep learning for generic object detection: A survey. In IJCV.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ECCV.
Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In ECCV.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In ICML.
Quan, R., Dong, X., Wu, Y., Zhu, L., & Yang, Y. (2019). Auto-reid: Searching for a part-aware convnet for person re-identification. In ICCV.
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv:1710.05941.
Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2018). Regularized evolution for image classifier architecture search. arXiv:1802.01548.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. IJCV, 115(3), 211–252.
Article MathSciNet Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.
Shu, Y., Wang, W., & Cai, S. (2020). Understanding architectures learnt by cell-based neural architecture search. In ICLR.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In NIPS.
Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2), 99–127.
Article Google Scholar
Suganuma, M., Shirakawa, S., & Nagao, T. (2017). A genetic programming approach to designing convolutional neural network architectures. In GECCO.
Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In CVPR.
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML.
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR.
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In CVPR.
Tian, Z., Shen, C., Chen, H., & He, T. (2020). Fcos: A simple and strong anchor-free object detector. arXiv:2006.09214.
Wang, H., Zhu, X., Gong, S., & Xiang, T. (2018). Person re-identification in identity regression space. IJCV, 126(12), 1288–1310.
Article Google Scholar
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR.
Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR.
Xie, L., & Yuille, A. (2017). Genetic CNN. In ICCV.
Xie, S., Kirillov, A., Girshick, R., & He, K. (2019a). Exploring randomly wired neural networks for image recognition. In ICCV.
Xie, S., Zheng, H., Liu, C., & Lin, L. (2019b). SNAS: Stochastic neural architecture search. In ICLR.
Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G. J., Tian, Q., & Xiong, H. (2020). PC-DARTS: Partial channel connections for memory-efficient architecture search. In ICLR.
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv:1605.07146.
Zela, A., Elsken, T., Saikia, T., Marrakchi, Y., Brox, T., & Hutter, F. (2020). Understanding and robustifying differentiable architecture search. In ICLR.
Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV.
Zheng, X., Ji, R., Tang, L., Zhang, B., Liu, J., & Tian, Q. (2019). Multinomial distribution learning for effective neural architecture search. In ICCV.
Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017) Random erasing data augmentation. arXiv:1708.04896.
Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. In ICLR.
Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In CVPR.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61831018, and 61631017, and Guangdong Province Key Research and Development Program Major Science and Technology Projects under Grant 2018B010115002.

Author information

Authors and Affiliations

Tongji University, Shanghai, People’s Republic of China
Xin Chen
Huawei Inc., Shenzhen, People’s Republic of China
Lingxi Xie & Qi Tian
School of Computer Science, Fudan University, Shanghai, People’s Republic of China
Jun Wu
Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, People’s Republic of China
Jun Wu

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lingxi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wu.

Additional information

Communicated by Mei Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Xie, L., Wu, J. et al. Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild. Int J Comput Vis 129, 638–655 (2021). https://doi.org/10.1007/s11263-020-01396-x

Download citation

Received: 20 December 2019
Accepted: 20 October 2020
Published: 03 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11263-020-01396-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

Abstract

Access this article

Similar content being viewed by others

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

Abstract

Access this article

Similar content being viewed by others

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation