Abstract
RGB-D salient object detection aims at identifying the most attractive parts from a RGB image and its corresponding depth image, which has been widely applied in many computer vision tasks. However, there are still two challenges: (1) how to quickly and effectively integrate the cross-modal features from the RGB-D data; and (2) how to mitigate the negative impact from the low-quality depth map. The previous methods mostly employ a two-stream architecture which adopts two backbone network to process RGB-D data and ignore the quality of depth map. In this paper, we propose a guided residual network to address these two issues. On the one hand, we design a simpler and efficient depth branch only using one convolutional layer and three residual modules to extract depth features instead of employing a pre-trained backbone to handle the depth data, and fuse RGB features and depth features in a multi-scale manner for refinement with top-down guidance. On the other hand, we add adaptive weight to depth maps to control the fusion between them, which mitigates the negative influence of unreliable depth map. Experimental results compared with 13 state-of-the-art methods on 7 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively, especially in efficiency (102 FPS) and compactness (64.2 MB).
Similar content being viewed by others
References
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR, pp. 3586–3593 (2013)
Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE TGRS 53(4), 2175–2184 (2014)
Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: ICML, pp. 597–606 (2015)
Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE TPAMI 40(1), 20–33 (2017)
Su, J., Li, J., Zhang, Y., Xia, C., Tian, Y.: Selectivity or Invariance: Boundary-aware Salient Object Detection. In: ICCV, pp. 3799-3808 (2019)
Zhao, T., Wu, X.: Pyramid Feature Attention Network for Saliency detection. In CVPR, pp. 3085-3094 (2019)
Zeng, Y., Zhuge, Y., Lu, H., Zhang, L., Qian, M., Yu, Y.: Multi-source weak supervision for saliency detection. In: CVPR, pp. 6074–6083 (2009)
Liu, J.-Ji., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J.: A Simple Pooling-Based Design for Real-Time Salient Object Detection. In: CVPR, pp. 3917–3926 (2019)
Zhao, J.-X., Liu, J.-J., Fan, D.-P., Cao, Y., Yang, J.-F., Cheng, M.-M.: EGNet: Edge Guidance Network for Salient Object Detection. In: ICCV, pp. 8779–8788 (2019)
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: CVPR, pp. 454–461 (2012)
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: Prior-model guided depth-enhanced network for salient object detection. In: ICME, pp. 199–204. IEEE (2019)
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In: CVPR, pp. 9060-9069 (2020)
Chen, Z., Huang, Q.: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. In: CVPR (2020)
Li, C., Cong, R., Kwong, S., Hou, J., Fu, H., Zhu, G., Zhang, D., Huang, Q.: ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans. Cybern 51(1), 88–100 (2021)
Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q.: JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: CVPR, pp. 3052–3062 (2020)
Zhang, J., Fan, D.-P., Dai, Y., Anwar, S.: UC-Net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: CVPR, pp. 8582–8591 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multi-scale structural similarity for image quality assessment. In: the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2003, vol. 2, pp. 1398–1402. Ieee (2003)
de Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. OR 134(1), 19–67 (2005)
Mattyus, G., Luo, W., Urtasun, R.: Deep-roadmapper: Extracting road topology from aerial images
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: ICIP, pp. 1115–1119 (2014)
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: A benchmark and algorithms. In: ECCV, pp. 92–109 (2014)
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: CVPR, pp. 2806–2813 (2014)
Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: ICCV, pp. 3008–3014 (2017)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: ICCV (2019)
Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. IEEE TNNLS (2020)
Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: A benchmark. IEEE TIP 24(12), 5706–5722 (2015)
Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: A New Way to Evaluate Foreground Maps. In: ICCV, pp. 4548–4557 (2017)
Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast prior and fluid pyramid integration for rgbd salient object detection. In: CVPR, pp. 3927–3936 (2019)
Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI, pp. 698–704 (2018)
Liangqiong, Q., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: Rgbd salient object detection via deep fusion. IEEE TIP 26(5), 2274–2285 (2017)
Han, J., Chen, H., Liu, N., Chenggang Y, Xuelong L.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE TCYB, pp. 3171–3183 (2018)
Chen, H., Li, Y., Dan, S.: Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. PR 86, 376–385 (2019)
Chen, H., Li, Y.: Three-stream Attention-aware Network for RGB-D Salient Object Detection. IEEE TIP, pp. 2825–2835 (2019)
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D Salient Object Detection. In: IEEE CVPR, pp. 3051–3060 (2018)
Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection. In: IEEE CVPR (2019)
Li, G., Liu, Z., Ling, H.: ICNet: information conversion Nnetwork for RGB-D based salient object detection. IEEE Trans. Image Process. 29, 4873–4884 (2020)
Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, Supplement and Focus for RGB-D Saliency Detection. In: CVPR, pp. 3472–3481 (2020)
Liu, N., Zhang, N., Han, J.: Learning Selective Self-Mutual Attention for RGB-D Saliency Detection. In: CVPR, pp. 13756–13765 (2020)
Hou, Q., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. IEEE TPAMI 41(4), 815–828 (2019)
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: CVPR, pp. 3051–3060 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene cnns. In: ICLR (2015)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: ECCV (2018)
Liu, S., Huang, D., Wang, Yu.: Receptive Field Block Net for Accurate and Fast Object Detection. In: ECCV, pp. 385–400 (2018)
Diederik, Kingma, P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
Liu, Z., Duan, Q., Shi, S., et al.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis Comput (2020). https://doi.org/10.1007/s00371-020-01821-9
Acknowledgements
This work was supported by the Natural Science Foundation of China (No. 61802336), Jiangsu Province 7th Projects for Summit Talents in Six Main Industries, Electronic Information Industry (DZXX-149, No.110).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, J., Chen, S., Lv, X. et al. Guided residual network for RGB-D salient object detection with efficient depth feature learning. Vis Comput 38, 1803–1814 (2022). https://doi.org/10.1007/s00371-021-02106-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02106-5