Abstract
Object detection based on convolutional neural network (CNN) should be trained effectively with much data. Data augmentation techniques devote to generate more data, which can enhance the generalization ability and robustness of detection network. For object detection in thermal infrared (TIR) images, objects are difficult to label because of the heavy noise and low resolution. So, it is highly recommended for us to do data augmentation. However, traditional data augmentation strategies (such as image flipping, random color jittering) only produce limited training samples. In order to generate images with high resolution, and ensure they are subject to the distribution of real samples, generative adversarial network (GAN) is introduced. To generate high-resolution samples, image pyramids are input into different branches, then these cascade features are fused to gradually improve the resolution. For the sake of improving the discriminant capability of discriminator, the feature matching loss is calculated when training. And the generated images with different resolutions are discriminated in multiple stages. The data augmentation algorithm proposed in this paper is called cascade pyramid generative adversarial network (CPGAN). No matter on the KAIST Multispectral data set or OSU thermal-color data set, with our CPGAN, the detection accuracy of classical detection algorithms is greatly improved. In addition, the detection speed remains entirely unaffected because CPGAN only exists in the training phase.
Similar content being viewed by others
References
Gu J et al (2018) Recent advances in convolutional neural networks. Pattern Recog 77:354–377
Hinton G et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Ren S, He K, Girshick R et al (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Redmon J et al (2016) You only look once: Unified real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Chen LC et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
Butler DJ et al (2012) A naturalistic open source movie for optical flow evaluation European conference on computer vision. Springer, Berlin Heidelberg
Denton EL, Chintala S, Fergus R (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494
Zhu JY, Krahenbhl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: European conference on computer vision. Springer, Cham, pp 597–613
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: Advances in neural information processing systems, pp 465–476
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3642–3649
Karras T et al (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196
Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp 13001–13008
Girshick R, Radosavovic I, Gkioxari G, Dollr P, He K (2018) Detectron. https://github.com/facebookresearch/detectron
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Yin D, Lopes RG, Shlens J, Cubuk ED, Gilmer J (2019) A fourier perspective on model robustness in computer vision. In: Advances in Neural Information Processing Systems, pp 13276– 13286
Lopes RG, Yin D, Poole B, Gilmer J, Cubuk ED (2019) Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv:1906.02611
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Dwibedi D, Misra I, Hebert M (2017) Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1301–1310
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3234–3243
Handa A, Patraucean V, Badrinarayanan V et al (2015) SceneNet: understanding real world indoor scenes with synthetic data.arXiv:1511.07041
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
Hinterstoisser Stefan et al (2018) On pre-trained image features and synthetic images for deep learning. In: Proceedings of the European conference on computer vision (ECCV)
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 23–30
Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the european conference on computer vision (ECCV), pp 699–715
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, Birchfield S (2018) Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 969–977
Zhu Y, Urtasun R, Salakhutdinov R, Fidler S (2015) segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4703–4711
Georgakis G, Mousavian A, Berg AC, Kosecka J (2017) Synthesizing training data for object detection in indoor scenes. arXiv:1702.07836
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. Springer, Cham, pp 694–711
Kaneko T, Hiramatsu K, Kashino K (2017) Generative attribute controller with conditional filtered generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6089–6098
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision, pp 1511–1520
Dosovitskiy A, Brox T (2016) Generating images with perceptual similarity metrics based on deep networks. In: Advances in neural information processing systems, pp 658–666
Aytar Y, Castrejon L, Vondrick C, Pirsiavash H, Torralba A (2017) Cross-modal scene networks. IEEE Trans Pattern Anal Mach Intell 40(10):2303–2314
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the european conference on computer vision (ECCV), pp 172–189
Wang TC et al (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Cho W, Choi S, Park DK, Shin I, Choo J (2019) Image-to-image translation via group-wise deep whitening-and-coloring transformation. In: Proceedings of the IEEE conference on computer vision pattern recognition, pp 10639–10647
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE international conference on computer vision, pp 6023–6032
Kingma DP, Ba LJ (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Dai X, Yuan X, Wei X (2020) TIRNet: Object detection in thermal infrared images for autonomous driving. Appl Intell. https://doi.org/10.1007/s10489-020-01882-2
Li Y, He K, Sun J et al (2016) R-FCN: Object detection via region-based fully convolutional networks. In: NIPS
Redmon J, Farhadi A (2017) YOLO9000: better, faster stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Redmon J, Yolov3 AF (2018) An incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. 61871024, the Key Science Projects of Shanxi Province No. 201903D03111114, and Science and technology project of Shanxi Jinzhong Development Zone.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dai, X., Yuan, X. & Wei, X. Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network. Appl Intell 52, 967–981 (2022). https://doi.org/10.1007/s10489-021-02445-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02445-9