Skip to main content
Log in

Scale-Aware Domain Adaptive Faster R-CNN

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Object detection typically assumes that training and test samples are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch may lead to a significant performance drop. In this work, we present Scale-aware Domain Adaptive Faster R-CNN, a model aiming at improving the cross-domain robustness of object detection. In particular, our model improves the traditional Faster R-CNN model by tackling the domain shift on two levels: (1) the image-level shift, such as image style, illumination, etc., and (2) the instance-level shift, such as object appearance, size, etc. The two domain adaptation modules are implemented by learning domain classifiers in an adversarial training manner. Moreover, we observe that the large variance in object scales often brings a crucial challenge to cross-domain object detection. Thus, we improve our model by explicitly incorporating the object scale into adversarial training. We evaluate our proposed model on multiple cross-domain scenarios, including object detection in adverse weather, learning from synthetic data, and cross-camera adaptation, where the proposed model outperforms baselines and competing methods by a significant margin. The promising results demonstrate the effectiveness of our proposed model for cross-domain object detection. The implementation of our model is available at https://github.com/yuhuayc/sa-da-faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1), 694–711.

    MathSciNet  Google Scholar 

  • Cai, Q., Pan, Y., Ngo, C.W., Tian, X., Duan, L., & Yao, T. (2019). Exploring object relation in mean teacher for cross-domain detection. In Conference on computer vision and pattern recognition (CVPR).

  • Chen, Y., Dai, D., Pont-Tuset, J., & Van Gool, L. (2016). Scale-aware alignment of hierarchical image segmentation. In Computer vision and pattern recognition (CVPR).

  • Chen, Y., Li, W., Chen, X., & Gool, L. V. (2019). Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach. In Conference on computer vision and pattern recognition (CVPR).

  • Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018). Domain adaptive faster r-cnn for object detection in the wild. In Conference on computer vision and pattern recognition (CVPR).

  • Chen, Y., Li, W., & Van Gool, L. (2018). Road: Reality oriented adaptation for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR).

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Conference on computer vision and pattern recognition (CVPR).

  • Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Conference on computer vision and pattern recognition (CVPR).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition (CVPR).

  • Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 465–479.

    Article  Google Scholar 

  • Duan, L., Xu, D., Tsang, I. W., & Luo, J. (2012). Visual event recognition in videos by learning from web data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.

    Article  Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In International conference on computer vision (ICCV).

  • Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (ICML).

  • Gebru, T., Hoffman, J., & Fei-Fei, L. (2017). Fine-grained recognition in the wild: A multi-task domain adaptation approach. arXiv preprint arXiv:1709.02476.

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.

    Article  Google Scholar 

  • Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016). Deep reconstruction-classification networks for unsupervised domain adaptation. In European conference on computer vision (ECCV).

  • Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware cnn model. In International conference on computer vision (ICCV)

  • Girshick, R. (2015). Fast r-cnn. In International conference on computer vision (ICCV).

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Conference on computer vision and pattern recognition (CVPR).

  • Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In Conference on computer vision and pattern recognition (CVPR).

  • Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). Dlow: Domain flow for adaptation and generalization. In Conference on computer vision and pattern recognition (CVPR).

  • Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In International conference on computer vision (ICCV).

  • Haeusser, P., Frerix, T., Mordvintsev, A., & Cremers, D. (2017). Associative domain adaptation. In International conference on computer vision (ICCV).

  • Hattori, H., Naresh Boddeti, V., Kitani, K. M., & Kanade, T. (2015). Learning scene-specific pedestrian detectors without real data. In Conference on computer vision and pattern recognition (CVPR).

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In International conference on computer vision (ICCV).

  • He, Z., & Zhang, L. (2019). Multi-adversarial faster-rcnn for unrestricted object detection. In International conference on computer vision (ICCV).

  • Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649.

  • Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision (ECCV).

  • Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In Conference on computer vision and pattern recognition (CVPR).

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (ECCV).

  • Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In 2017 IEEE international conference on robotics and automation (ICRA).

  • Khodabandeh, M., Vahdat, A., Ranjbar, M., & Macready, W. G. (2019). A robust learning approach to domain adaptive object detection. arXiv preprint arXiv:1904.02361.

  • Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In International conference on computer vision (ICCV).

  • Kim, T., Cha, M., Kim, H., Lee, J., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In International conference on computer vision (ICCV).

  • Kim, T., Jeong, M., Kim, S., Choi, S., & Kim, C. (2019). Diversify and match: A domain adaptive representation learning paradigm for object detection. In Conference on computer vision and pattern recognition (CVPR).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.

  • Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Conference on computer vision and pattern recognition (CVPR).

  • Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In International conference on computer vision (ICCV).

  • Li, W., Xu, Z., Xu, D., Dai, D., & Van Gool, L. (2017). Domain generalization and adaptation using low rank exemplar svms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1114–1127.

    Article  Google Scholar 

  • Li, Y., He, K., Sun, J., et al. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2016). Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144.

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV).

  • Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In European conference on computer vision (ECCV).

  • Long, M., Cao, Y., Wang, J., & Jordan, M. I. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (ICML).

  • Lu, H., Zhang, L., Cao, Z., Wei, W., Xian, K., Shen, C., & van den Hengel, A. (2017). When unsupervised domain adaptation meets tensor representations. In International conference on computer vision (ICCV).

  • Maria Carlucci, F., Porzi, L., Caputo, B., Ricci, E., & Rota Bulo, S. (2017). Autodial: Automatic domain alignment layers. In International conference on computer vision (ICCV).

  • Motiian, S., Piccirilli, M., Adjeroh, D. A., & Doretto, G. (2017). Unified deep supervised domain adaptation and generalization. In International conference on computer vision (ICCV).

  • Narasimhan, S. G., & Nayar, S. K. (2002). Vision and the atmosphere. International Journal of Computer Vision, 48(3), 233–254.

    Article  Google Scholar 

  • Panareda Busto, P., & Gall, J. (2017). Open set domain adaptation. In International conference on computer vision (ICCV).

  • Peng, X., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3d models. In International conference on computer vision (ICCV).

  • Raj, A., Namboodiri, V. P., & Tuytelaars, T. (2015). Subspace alignment based domain adaptation for rcnn detector. arXiv preprint arXiv:1507.05578.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems.

  • RoyChowdhury, A., Chakrabarty, P., Singh, A., Jin, S., Jiang, H., Cao, L., & Learned-Miller, E. (2019). Automatic adaptation of object detectors to new domains using self-training. In Conference on computer vision and pattern recognition (CVPR).

  • Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In Conference on computer vision and pattern recognition (CVPR).

  • Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126(9), 973–992.

    Article  Google Scholar 

  • Sener, O., Song, H.O., Saxena, A., & Savarese, S. (2016). Learning transferrable representations for unsupervised domain adaptation. In Advances in neural information processing systems.

  • Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.

  • Shan, Y., Lu, W. F., & Chew, C. M. (2019). Pixel and feature level based domain adaptation for object detection in autonomous driving. Neurocomputing, 367, 31–38.

    Article  Google Scholar 

  • Shen, Z., Maheshwari, H., Yao, W., & Savvides, M. (2019). Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv preprint arXiv:1911.02559.

  • Sun, B., Feng, J., & Saenko, K. (2015). Return of frustratingly easy domain adaptation. arXiv preprint arXiv:1511.05547.

  • Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In The British machine vision conference (BMVC).

  • Tang, K., Ramanathan, V., Fei-Fei, L., & Koller, D. (2012). Shifting weights: Adapting object detectors from image to video. In Advances in neural information processing systems (pp. 638–646).

  • Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Conference on computer vision and pattern recognition (CVPR).

  • Tsai, Y. H., Hung, W. C., Schulter, S., Sohn, K., Yang, M. H., & Chandraker, M. (2018). Learning to adapt structured output space for semantic segmentation. In Conference on computer vision and pattern recognition (CVPR).

  • Tzeng, E., Burns, K., Saenko, K., & Darrell, T. (2018). Splat: Semantic pixel-level adaptation transforms for detection. arXiv preprint arXiv:1812.00929.

  • Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Conference on computer vision and pattern recognition (CVPR).

  • Xie, R., Yu, F., Wang, J., Wang, Y., & Zhang, L. (2019). Multi-level domain adaptive learning for cross-domain detection. In Proceedings of the IEEE international conference on computer vision workshops.

  • Xu, J., Ramos, S., Vázquez, D., & Lopez, A. M. (2014). Domain adaptation of deformable part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12), 2367–2380.

    Article  Google Scholar 

  • Xu, P., Gurram, P., Whipps, G., & Chellappa, R. (2019). Wasserstein distance based domain adaptation for object detection. arXiv preprint arXiv:1909.08675.

  • Yi, Z., Zhang, H., Gong, P.T., et al. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. In International conference on computer vision (ICCV).

  • Zhang, L., Lin, L., Liang, X., & He, K. (2016). Is faster r-cnn doing well for pedestrian detection? In European conference on computer vision (ECCV).

  • Zhang, Y., David, P., & Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes. arXiv preprint arXiv:1707.09465.

  • Zhao, S., Fu, H., Gong, M., & Tao, D. (2019). Geometry-aware symmetric domain adaptation for monocular depth estimation. In Conference on computer vision and pattern recognition (CVPR).

  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In International conference on computer vision (ICCV).

  • Zhu, X., Pang, J., Yang, C., Shi, J., & Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In Conference on computer vision and pattern recognition (CVPR).

Download references

Acknowledgements

This work is partially supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400. The authors would like to acknowledge support by Armasuisse and Toyota TRACE, and thank AWS for providing cloud credits.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Li.

Additional information

Communicated by Minsu Cho.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Recommended by: Yasuyuki Matsushita.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Wang, H., Li, W. et al. Scale-Aware Domain Adaptive Faster R-CNN. Int J Comput Vis 129, 2223–2243 (2021). https://doi.org/10.1007/s11263-021-01447-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01447-x

Keywords

Navigation