NAS-FCOS: Efficient Search for Object Detection Architectures

Wang, Ning; Gao, Yang; Chen, Hao; Wang, Peng; Tian, Zhi; Shen, Chunhua; Zhang, Yanning

doi:10.1007/s11263-021-01523-2

NAS-FCOS: Efficient Search for Object Detection Architectures

Published: 15 October 2021

Volume 129, pages 3299–3312, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ning Wang^1,2^na1,
Yang Gao^1,2^na1,
Hao Chen³^na1,
Peng Wang ORCID: orcid.org/0000-0001-7689-3405^1,2,
Zhi Tian³,
Chunhua Shen⁴ &
…
Yanning Zhang^1,2

1563 Accesses
10 Citations
4 Altmetric
Explore all metrics

Abstract

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures. What is noteworthy is that as of now, object detection is less touched by NAS algorithms despite its significant importance in computer vision. To the best of our knowledge, most of the recent NAS studies on object detection tasks fail to satisfactorily strike a balance between performance and efficiency of the resulting models, let alone the excessive amount of computational resources cost by those algorithms. Here we propose an efficient method to obtain better object detectors by searching for the feature pyramid network as well as the prediction head of a simple anchor-free object detector, namely, FCOS (Tian et al. in FCOS: Fully convolutional one-stage object detection, 2019), using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs. The discovered architectures surpass state-of-the-art object detection models (such as Faster R-CNN, RetinaNet and, FCOS) by 1.0 to 5.4% points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS method for object detection. Code is available at https://github.com/Lausannen/NAS-FCOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Microsoft COCO: Common Objects in Context

References

Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of international conference learning representations.
Chen, B., Ghiasi, G., Liu, H., Lin, T. Y., Kalenichenko, D., Adam, H., & Le, Q. V. (2020). Mnasfpn: Learning latency-aware pyramid architecture for object detection on mobile devices. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13607–13616
Chen, Y., Yang, T., Zhang, X., Meng, G., Pan, C., & Sun, J. (2019). DetNAS: Neural architecture search on object detection. In Proceedings of advances in neural information processing systems.
Du, X., Lin, T. Y., Jin, P., Ghiasi, G., Tan, M., Cui, Y., Le, Q. V., & Song, X. (2020). Spinenet: Learning scale-permuted backbone for recognition and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11592–11601.
Elsken, T., Metzen, J.H., & Hutter, F. (2019). Neural architecture search: A survey. The Journal of Machine Learning Research.
Ghiasi, G., Lin, T. Y., Pang, R., & Le, Q. V. (2019). NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Guo, J., Han, K., Wang, Y., Zhang, C., Yang, Z., Wu, H., Chen, X., & Xu, C. (2020a). Hit-detector: Hierarchical trinity architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11405–11414.
Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., & Sun, J. (2020b). Single path one-shot neural architecture search with uniform sampling. In Proceedings of European conference on computer vision, pp. 544–560.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity mappings in deep residual networks. In Proceedings of European conference on computer vision, pp. 630–645.
Jiang, C., Xu, H., Zhang, W., Liang, X., & Li, Z. (2020). Sp-nas: Serial-to-parallel backbone search for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11863–11872.
Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Kong, T., Sun, F., Liu, H., Jiang, Y., & Shi, J. (2019) Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797
Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision, pp. 734–750.
Liang, F., Lin, C., Guo, R., Sun, M., Wu, W., Yan, J., & Ouyang, W. (2020). Computation reallocation for object detection. In Proceedings of international conference on learning representations.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988.
Liu, C., Chen, L. C., Schroff, F., Adam, H., Hua, W., Yuille, A., & Fei-Fei, L. (2019). Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Liu, H., Peng, C., Yu, C., Wang, J., Liu, X., Yu, G., & Jiang, W. (2019). An end-to-end network for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Liu, H., Simonyan, K., & Yang, Y. (2019). Darts: Differentiable architecture search. In Proceedings of international conference on learning representations.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. (2016). SSD: Single shot multibox detector. In Proceedings of European conference on computer vision, pp. 21–37.
Nekrasov, V., Chen, H., Shen, C., & Reid, I. (2019). Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Peng, J., Sun, M., Zhang, Z. X., Tan, T., & Yan, J. (2019). Efficient neural architecture transformation search in channel-level for object detection. In Proceedings of advances in neural information processing system, pp. 14313–14322.
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In Proceedings of international conference on machine learning.
Redmon, J., & Farhadi, A. (2017). Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Redmon Joseph, S. D. R. G., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition., pp. 4510–4520.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Stamoulis, D., Ding, R., Wang, D., Lymberopoulos, D., Priyantha, B., Liu, J., & Marculescu, D. (2019). Single-path NAS: Designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877.
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790.
Tian, Z., Shen, C., Chen, H., & He, T. (2019). FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision.
Uijlings, J., Sande, K. V. D., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision.
Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European conference on computer vision, pp. 3–19.
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the European conference on computer vision.
Xiong, Y., Liu, H., Gupta, S., Akin, B., Bender, G., Kindermans, P.J., Tan, M., Singh, V., & Chen, B. (2020). MobileDets: Searching for object detection architectures for mobile accelerators. arXiv preprint arXiv:2004.14525
Xu, H., Yao, L., Zhang, W., Liang, X., & Li, Z. (2019). Auto-FPN: Automatic network architecture adaptation for object detection beyond classification. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6649–6658.
Yang, Z., Yinghao, X., Han, X., Zheng, Z., Raquel, U., Liwei, W., et al. (2020). Dense reppoints: Representing visual objects with dense point sets. In Proceedings of the European conference on computer vision.
Yao, L., Xu, H., Zhang, W., Liang, X., & Li, Z. (2020). SM-NAS: Structural-to-modular neural architecture search for object detection. In Proceedings of the AAAI conference on artificial intelligence.
Zhao, T., & Wu, X. (2019). Pyramid feature attention network for saliency detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Zhong, Y., Deng, Z., Guo, S., Scott, M. R., & Huang, W. (2020). Representation sharing for fast object detector search and beyond. In Proceedings of the European conference on computer vision.
Zhou, H., Yang, M., Wang, J., & Pan, W. (2019). BayesNAS: A bayesian approach for neural architecture search. In International conference on machine learning.
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850.
Zhu, C., He, Y., & Savvides, M. (2019). Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Zhu, X., Hu, H., Lin, S., & Dai, J. (2019). Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. In Proceedings of international conference on learning representations.
Zoph, B., Vasudevan, V., Jonathon, S., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Download references

Acknowledgements

NW, YG, PW and YZ’s participation in this work was supported by the National Key R&D Program of China (No. 2020AAA0106900), and the National Natural Science Foundation of China (Nos. U19B2037, 61876152).

Author information

Ning Wang, Yang Gao, Hao Chen contributed to this work equally.

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Ning Wang, Yang Gao, Peng Wang & Yanning Zhang
National Engineering Lab for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, China
Ning Wang, Yang Gao, Peng Wang & Yanning Zhang
The University of Adelaide, Adelaide, Australia
Hao Chen & Zhi Tian
Monash University, Melbourne, Australia
Chunhua Shen

Authors

Ning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Chunhua Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yanning Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Wang.

Additional information

Communicated by Deva Ramanan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, N., Gao, Y., Chen, H. et al. NAS-FCOS: Efficient Search for Object Detection Architectures. Int J Comput Vis 129, 3299–3312 (2021). https://doi.org/10.1007/s11263-021-01523-2

Download citation

Received: 07 November 2020
Accepted: 03 August 2021
Published: 15 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11263-021-01523-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NAS-FCOS: Efficient Search for Object Detection Architectures

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NAS-FCOS: Efficient Search for Object Detection Architectures

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation