Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network

Dai, Xuerui; Yuan, Xue; Wei, Xueye

doi:10.1007/s10489-021-02445-9

Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network

Published: 14 May 2021

Volume 52, pages 967–981, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1215 Accesses
12 Citations
Explore all metrics

Abstract

Object detection based on convolutional neural network (CNN) should be trained effectively with much data. Data augmentation techniques devote to generate more data, which can enhance the generalization ability and robustness of detection network. For object detection in thermal infrared (TIR) images, objects are difficult to label because of the heavy noise and low resolution. So, it is highly recommended for us to do data augmentation. However, traditional data augmentation strategies (such as image flipping, random color jittering) only produce limited training samples. In order to generate images with high resolution, and ensure they are subject to the distribution of real samples, generative adversarial network (GAN) is introduced. To generate high-resolution samples, image pyramids are input into different branches, then these cascade features are fused to gradually improve the resolution. For the sake of improving the discriminant capability of discriminator, the feature matching loss is calculated when training. And the generated images with different resolutions are discriminated in multiple stages. The data augmentation algorithm proposed in this paper is called cascade pyramid generative adversarial network (CPGAN). No matter on the KAIST Multispectral data set or OSU thermal-color data set, with our CPGAN, the detection accuracy of classical detection algorithms is greatly improved. In addition, the detection speed remains entirely unaffected because CPGAN only exists in the training phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

A fast-training GAN for coal–gangue image augmentation based on a few samples

Article 22 December 2023

Luyao Wang, Xuewen Wang, … Rui Xia

Generative Adversarial Networks for Improving Object Detection in Camouflaged Images

Image Classification Method Based on Generative Adversarial Network

References

Gu J et al (2018) Recent advances in convolutional neural networks. Pattern Recog 77:354–377
Article Google Scholar
Hinton G et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Article Google Scholar
Ren S, He K, Girshick R et al (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Redmon J et al (2016) You only look once: Unified real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Chen LC et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
Butler DJ et al (2012) A naturalistic open source movie for optical flow evaluation European conference on computer vision. Springer, Berlin Heidelberg
Google Scholar
Denton EL, Chintala S, Fergus R (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494
Zhu JY, Krahenbhl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: European conference on computer vision. Springer, Cham, pp 597–613
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: Advances in neural information processing systems, pp 465–476
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3642–3649
Karras T et al (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196
Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp 13001–13008
Girshick R, Radosavovic I, Gkioxari G, Dollr P, He K (2018) Detectron. https://github.com/facebookresearch/detectron
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Yin D, Lopes RG, Shlens J, Cubuk ED, Gilmer J (2019) A fourier perspective on model robustness in computer vision. In: Advances in Neural Information Processing Systems, pp 13276– 13286
Lopes RG, Yin D, Poole B, Gilmer J, Cubuk ED (2019) Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv:1906.02611
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Dwibedi D, Misra I, Hebert M (2017) Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1301–1310
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3234–3243
Handa A, Patraucean V, Badrinarayanan V et al (2015) SceneNet: understanding real world indoor scenes with synthetic data.arXiv:1511.07041
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
Hinterstoisser Stefan et al (2018) On pre-trained image features and synthetic images for deep learning. In: Proceedings of the European conference on computer vision (ECCV)
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 23–30
Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the european conference on computer vision (ECCV), pp 699–715
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, Birchfield S (2018) Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 969–977
Zhu Y, Urtasun R, Salakhutdinov R, Fidler S (2015) segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4703–4711
Georgakis G, Mousavian A, Berg AC, Kosecka J (2017) Synthesizing training data for object detection in indoor scenes. arXiv:1702.07836
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. Springer, Cham, pp 694–711
Kaneko T, Hiramatsu K, Kashino K (2017) Generative attribute controller with conditional filtered generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6089–6098
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision, pp 1511–1520
Dosovitskiy A, Brox T (2016) Generating images with perceptual similarity metrics based on deep networks. In: Advances in neural information processing systems, pp 658–666
Aytar Y, Castrejon L, Vondrick C, Pirsiavash H, Torralba A (2017) Cross-modal scene networks. IEEE Trans Pattern Anal Mach Intell 40(10):2303–2314
Article Google Scholar
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the european conference on computer vision (ECCV), pp 172–189
Wang TC et al (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Cho W, Choi S, Park DK, Shin I, Choo J (2019) Image-to-image translation via group-wise deep whitening-and-coloring transformation. In: Proceedings of the IEEE conference on computer vision pattern recognition, pp 10639–10647
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE international conference on computer vision, pp 6023–6032
Kingma DP, Ba LJ (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Dai X, Yuan X, Wei X (2020) TIRNet: Object detection in thermal infrared images for autonomous driving. Appl Intell. https://doi.org/10.1007/s10489-020-01882-2
Li Y, He K, Sun J et al (2016) R-FCN: Object detection via region-based fully convolutional networks. In: NIPS
Redmon J, Farhadi A (2017) YOLO9000: better, faster stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Redmon J, Yolov3 AF (2018) An incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61871024, the Key Science Projects of Shanxi Province No. 201903D03111114, and Science and technology project of Shanxi Jinzhong Development Zone.

Author information

Authors and Affiliations

The School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
Xuerui Dai, Xue Yuan & Xueye Wei
The Department of Intelligent Internet Connection, Beijing Wanji Technology Co., Ltd, Beijing, China
Xuerui Dai

Authors

Xuerui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xue Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xueye Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xue Yuan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, X., Yuan, X. & Wei, X. Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network. Appl Intell 52, 967–981 (2022). https://doi.org/10.1007/s10489-021-02445-9

Download citation

Accepted: 18 April 2021
Published: 14 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02445-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network

Abstract

Access this article

Similar content being viewed by others

A fast-training GAN for coal–gangue image augmentation based on a few samples

Generative Adversarial Networks for Improving Object Detection in Camouflaged Images

Image Classification Method Based on Generative Adversarial Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network

Abstract

Access this article

Similar content being viewed by others

A fast-training GAN for coal–gangue image augmentation based on a few samples

Generative Adversarial Networks for Improving Object Detection in Camouflaged Images

Image Classification Method Based on Generative Adversarial Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation