Abstract
In the field of object detection, domain migration has gradually become a hot issue. We hope that the model trained in the domain with label-rich can be applied to other domains with label-poor or without labels, which can save a lot of time and energy for annotation, but different domain distributions are always mismatched; such a distribution mismatch will lead to a sharp decline in domain transfer performance. In this work, to improve the performance of object detection for domain transfer, we tackle the domain shift on two levels: (1) the contour-level shift, such as appearance, shape, and size, and (2) the material-level shift, such as texture, shade, and color. We apply different alignments to the aforementioned levels, specifically contour-level adaptation with full alignment and material-level adaptation with selective alignment. We construct a domain adaptation framework based on the recent state-of-the-art SSD model, and SSD is the abbreviation of single shot multibox detector, which is preeminent above most of the other approaches proposed in large numbers due to its real-time performance and effectiveness. We design two domain adapters on contour level and material level, respectively, to alleviate the domain discrepancy. Recently, approaches that align distributions of source and target images employing an adversarial loss have been proven effective, so the two domain adapters are implemented by learning a domain classifier in adversarial training manner, and the domain classifiers on different levels are further reinforced with a consistency regularization in the SSD model. We empirically verify the effectiveness of our method, which outperforms the other three state-of-the-art methods by a large margin of 5–10% in terms of mean average precision (mAP) on various datasets in both similar and dissimilar domain shift scenarios.
Similar content being viewed by others
References
Carlucci FM, Porzi L, Caputo B, Ricci E, Bul`o SR (2017) Autodial: automatic domain alignment layers. In international conference on computer vision
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In CVPR
Busto PP, Gall J (2017) Open set domain adaptation. In ICCV
Isola P, Zhu J, Zhou T, Efros A (2017) Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition(CVPR), volume 00, pages 5967–5976, July
Bateni P, Goyal R, Franke U, Roth S (2020) Improved few-shot visual classification. In CVPR
Deng J, Dong W, Socher R, Li K, Fei-Fei L (2018) Imagenet: a large-scale hierarchical image database. In CVPR
Liu W, Anguelov D, Erhan D, Szegedy C, Ree-d S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In ECCV,
Yu CH, Dwang J (2019) Transfer learning with dynamic adversarial adaptation network[C]// Proceedings of the 2019 IEEE conference on computer vision and pattern recognition. Piscataway: IEEE, 936–944
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain adversarial training of neural networks. JMLR 17(59):1–35
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In NIPS
Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: ground truth from computer games. In European Conference on Computer Vision, pages 102–118. Springer
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR
Girshick RB, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524
Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: Cycle-consistent adversarial domain adaptation. In ICML
Inoue N, Furuta R, Yamasaki T, Aizawa K (2018) Cross-domain weakly-supervised object detection through progressive domain adaptation
Jia Y, Shelhamer E, Donahue J, Karayev S, Lo-ng J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In ACMMM
Tang Y, Wang J, Gao B, Dellandr´ea E, Gaizau-skas R, Chen L (2016) Large scale semi-supervised object detection using visual and semantic knowledge transfer. In CVPR
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In NIPS
Saito K, Watanabe K, Ushiku Y, Harada T (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR
Lin T-Y, Maire M, Belongie S, Hays J, Peron-a P, Ramanan D, Doll´ar P, Zitnick CL (2020) Visual Commonsense R-CNN. In ECCV
Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to image translation networks. In NIPS
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. IJCV 88(2):303–338
Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In CVPR
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In CVPR
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer net-works. In NIPS.
Fellow G, Pouget J, Xu B, et al. (2014) Generative adversarial nets[C]//Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. Piscataway: IEEE, 367–384
Odena A, Olah C, Shlens J (2020) Unbiased scene graph generation from biased training
Paszke A, Gross S, Chintala S, Chanan G, Yang E, De-Vito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In CVPR
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS
Saenko K, Kulis B, Fritz M, Darrell T (2020) Meta-transfer learning for zero-shot super-resolution. In ECCV
Saito K, Ushiku Y, Harada T, Saenko K (2018) Adversarial dropout regularization. In ICLR
Lin T.-Y, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In ICCV
Saito K, Yamamoto S, Ushiku Y, Harada T (2018) Open set domain adaptation by backpropagation. In CCV
Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. IJCV
Sankaranarayanan S, Balaji Y, Jain A, Lim SN, Chellappa R (2018) Learning from synthetic data: addressing domain shift for semantic segmentation. In CVPR
Sakaridis C, Dai D, Van Gool L (2017) Semantic foggy scene understanding with synthetic data. CoRR, abs/1708.07819
Girshick R (2015) Fast r-cnn. In Proceedings of the 2015 IEEE international conference on computer vision (ICCV), ICCV’15, pages 1440–1448, Washington, DC, USA. IEEE Computer Society
Johnson-Roberson M, Barto C, Mehta R, Sridhar SN, Rosaen K, Vasudevan R (2017) Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In: IEEE International Conference on Robotics and Automation, pp 1–8
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, N., Fang, J., Xu, J. et al. SSD based on contour–material level for domain adaptation. Pattern Anal Applic 24, 1221–1229 (2021). https://doi.org/10.1007/s10044-021-00986-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-00986-w