Simple feature pyramid network for weakly supervised object localization using multi-scale information

Koo, Bongyeong; Choi, Han-Soo; Kang, Myungjoo

doi:10.1007/s11045-021-00778-9

Simple feature pyramid network for weakly supervised object localization using multi-scale information

Published: 10 May 2021

Volume 32, pages 1185–1197, (2021)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Bongyeong Koo¹,
Han-Soo Choi² &
Myungjoo Kang¹

396 Accesses
2 Citations
Explore all metrics

Abstract

The purpose of weakly supervised object localization (WSOL) is to localize an object requiring only classification labels. However, most WSOL methods tend to find a specific part of an object. Further, they introduce more complex optimization problems than the classification problem to compensate for the lack of resources such as bounding box annotation. To be more efficient WSOL, we propose a new architecture that utilizes feature pyramid network (FPN) and multi-scale information to deal with simplified optimization and to improve the localization. In our proposed model, FPN produces multi-scale and high-quality feature maps, and then these feature maps are gathered to conduct classification. Therefore, we can use high-quality and abundant information for localization, which induces several advantages. First, our proposed model improves localization. Second, we don’t have to require solving complex optimization problem. In particular, the second advantage alleviates a significant burden such as hyperparameter tuning. Also, we confirmed through experiments that our proposed method outperforms state-of-the-art methods on the CUB-200-2011 and ILSVRC datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Microsoft COCO: Common Objects in Context

References

Choe, J., & Shim, H. (2019). Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2219–2228.
Cubuk, E. D, Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 113–123.
Dabkowski, P., & Gal, Y. (2017). Real time image saliency for black box classifiers. In Advances in neural information processing systems (pp. 6967–6976).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440.
Luo, L., Yuan, C., Zhang, K., Jiang, Y., Zhang, Y., & Zhang, H. (2020). Double shot: Preserve and erase based class attention networks for weakly supervised localization (peca-net). In 2020 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Mai, J., Yang, M., & Luo, W. (2020). Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8766–8775.
Meethal, A., Pedersoli, M., Belharbi, S., & Granger, E. (2019). Convolutional stn for weakly supervised object localization and beyond. arXiv preprint arXiv:1912.01522.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., & DeVito, Z. (2017). Zeming Lin. Alban Desmaison: Luca Antiga, and Adam Lerer. Automatic differentiation (in pytorch).
Ren, S., He, K., Girshick, K., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer.
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Singh, K. K., & Lee, Y. J. (2017). Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV) (pp. 3544–3553). IEEE.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology.
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., & Ye, Q. (2019). Danet: Divergent activation for weakly supervised object localization. In Proceedings of the IEEE international conference on computer vision, pp. 6589–6598.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE international conference on computer vision, pp. 6023–6032.
Zhang, X., Wei, Y., Feng, J., Yang, Y., & Huang, T. S. (2018). Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1325–1334.
Zhang, X., Wei, Y., Kang, G., Yang, Y., & Huang, T. (2018). Self-produced guidance for weakly-supervised object localization. In Proceedings of the European conference on computer vision (ECCV), pp. 597–613.
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212.
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2det: A single-shot object detector based on multi-level feature pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 9259–9266.
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929.

Download references

Acknowledgements

Myungjoo Kang was supported by the National Research Foundation Grant of Korea (2015R1A5A1009350, 2021R1A2C3010887) and the ICT R&D program of MSIT/IITP(No. 1711117093).

Author information

Authors and Affiliations

Mathematical Science, Seoul National University, GwanakRo 1, Gwanak-Gu, Seoul, 151-747, Korea
Bongyeong Koo & Myungjoo Kang
Research Institute of Mathematics, Seoul National University, Seoul, Republic of Korea
Han-Soo Choi

Authors

Bongyeong Koo
View author publications
You can also search for this author in PubMed Google Scholar
Han-Soo Choi
View author publications
You can also search for this author in PubMed Google Scholar
Myungjoo Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myungjoo Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koo, B., Choi, HS. & Kang, M. Simple feature pyramid network for weakly supervised object localization using multi-scale information. Multidim Syst Sign Process 32, 1185–1197 (2021). https://doi.org/10.1007/s11045-021-00778-9

Download citation

Received: 12 October 2020
Revised: 26 February 2021
Accepted: 08 April 2021
Published: 10 May 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11045-021-00778-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple feature pyramid network for weakly supervised object localization using multi-scale information

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simple feature pyramid network for weakly supervised object localization using multi-scale information

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation