Elsevier

Neurocomputing

Volume 437, 21 May 2021, Pages 58-71
Neurocomputing

BCNet: Bidirectional collaboration network for edge-guided salient object detection

https://doi.org/10.1016/j.neucom.2021.01.034Get rights and content

Abstract

The boundary quality is a key factor determining the success of accurate salient object detection (SOD). A number of edge-guided SOD methods have been proposed to improve the boundary quality, but shown unsatisfactory performance due to the lack of a comprehensive consideration of multi-level feature fusion and multi-type feature aggregation. To resolve this issue, we propose a novel Bidirectional Collaboration Network (BCNet), which integrates effective multi-level feature fusion and multi-type feature aggregation into a unified edge-guided SOD framework. Specifically, we first utilize multiple Consistency Saliency Maximization (CSM) modules to propagate the highest level semantic representations in a top-down progressive pathway to generate both global edge representations and a series of region representations. Multiple Bounded Feature Fusion (BFF) modules are then utilized to refine the region features with the edge features. The CSM and BFF modules enable robust multi-level feature fusion and multi-type feature aggregation with only little extra computation, which allows a high computational efficiency. Finally, BCNet is jointly trained with edge and region losses in an end-to-end manner. Extensive comparisons are conducted with 17 state-of-the-art methods on five challenging benchmark datasets. Thanks to the use of CSM and BFF modules, our BCNet outperforms existing deep learning based SOD methods, including the latest edge-guided ones, in terms of both detection accuracy and processing speed.

Introduction

Salient object detection (SOD), acting as a powerful pre-processing tool in numerous computer vision tasks, mimics the human visual attention mechanism for identifying attention-grabbing objects from natural images. It has a large number of applications, such as autonomous driving [1], robot navigation [2], visual tracking [3], image retrieval [4], aesthetics assessment [5], and content-aware image editing [6]. Inspired by the progress in perceptual psychology, early models detect salient objects using heuristic priors[7], [8] and hand-crafted features such as contrast [9] and distance transformation [10]. However, their detection performance is seriously limited in complex scenarios. Recent works have demonstrated that deep learning techniques, especially the Convolutional Neural Networks (CNNs) [11], [12], [13], [14], are particularly good at understanding visual concepts by extracting semantic features from image regions, and have achieved remarkable performance [15], [16], [17], [18]. Despite their advantages, existing methods suffer from two major limitations. First, it is still challenging to detect entire salient objects from a complex background, even using deep-based methods. Second, most existing methods are unable to accurately detect the boundaries of salient objects.

To overcome these limitations, a number of methods have been proposed in recent years [19], [20], [21], [22], [23], [24], [25], [26]. For instance, fusing the multi-level features from low-level and high-level convolutional layers [20], [21], [27] improves the detection of objects from a complex background. In addition, resorting to additional edge guidance [24], [25], [26], [28] is able to improve the accuracy of boundaries. However, most of the existing solutions only focus on addressing one aspect of the limitations, while overlooking the other one. In addition, although edge-guided methods such as [24], [28] provide encouraging boundary quality, the aggregation of multi-type features, i.e., region and edge features, is achieved by the naive concatenation or element-wise addition/multiplication, which might be suboptimal and ineffective.

To this end, we propose a novel bidirectional collaboration network, called BCNet, which integrates effective multi-level feature fusion and multi-type feature aggregation into a unified SOD framework. BCNet utilizes the edge features to guide region features, which automatically discards low-quality features and highlights more edge details, as shown in Fig. 1. Specifically, we introduce a module, called Consistency Saliency Maximization (CSM), which is inspired by the spatial attention mechanism [29], and embed it to BCNet to mitigate the discrepancy in different levels of features for effective feature fusion. Fig. 1(d) and (e) show the edge feature map and the feature map generated by fusing shallow features (b) and high-level features (c) with our CSM module. As can be observed, after fusion, the entire objects become clear and background noise is suppressed. To improve the edge sharpness and accuracy, we introduce another module, called Bounded Feature Fusion (BFF), which is inspired by the Squeeze-Excitation block in [30], to aggregate the multi-type features provided by the CSM modules. Different from existing methods relying on simple concatenation or element-wise operations [24], [28], BFF utilizes effective feature re-weighting to sharpen the edges (Fig. 1(f)). Finally, BCNet is jointly trained with edge and region losses in an end-to-end manner. It is worth noting that CSM and BFF modules enable effective multi-level feature fusion and multi-type feature aggregation, but only inducing little extra computation, which allows effective real-time processing at 52fps. The main contributions of this paper are as follows:

  • We propose a novel bidirectional collaboration network BCNet for edge-guided SOD, which effectively addresses multi-level feature fusion and multi-type feature aggregation within a unified framework. Accordingly, two modules, called Consistency Saliency Maximization (CSM) and Bounded Feature Fusion (BFF), are introduced.

  • We construct a new bidirectional collaboration architecture for BCNet, where local region features are first organized in a top-down progressive pathway to propagate the highest level semantic representations, and then the global edge features are used to refine the obtained region features for final prediction.

  • Extensive comparisons are conducted with 17 state-of-the-art (SOTA) methods on five challenging benchmark datasets, demonstrating that the proposed BCNet performs favorably against the latest SOTA models in terms of both accuracy and speed. Notably, BCNet achieves real-time speed at 52fps, shown to be one of the fastest models comparing to SOTAs.

The rest of the paper is organized as follows. Section 2 describes related works on deep SOD, especially the edge-guided models. Section 3 describes the proposed BCNet in detail. Experimental results, performance evaluation and comparisons are presented in Section 4. Finally, conclusions are drawn in Section 5.

Section snippets

Related works

There are various types of SOD problems, such as RGB SOD [31], [23], [32], [33], [34], [35], RGB-D SOD [36], [37], [38], [39], [40], [41], [42], [43], [44], light field SOD [45], [46], [47], high-resolution SOD [48], video SOD (VSOD) [49], [50], 360 omnidirectional SOD [51], and co-salient object detection (Co-SOD) [52], [53], [54]. In addition, salient objects can also act as the negative samples of camouflaged objects [55], [56]. This paper focuses on the RGB SOD problem, which aims to detect

Bidirectional collaboration network

Bidirectional Collaboration Network (BCNet) is an edge-guided framework for SOD. It effectively resolves the limitations in existing edge-guided SOD methods by integrating the multi-level feature fusion and multi-type feature aggregation in a unified framework. We detail the BCNet in this section. Specifically, the overall structure is given in Section 3.1. The two key components in BCNet, i.e., CSM module and BFF module, are described in Section 3.2 Consistency saliency maximization (CSM)

Datasets and metrics

Similar to [82], [64], [28], [83], we train BCNet using the DUTS-TR [84] dataset containing 10553 images from various scenes. Since there is no available boundary annotation for DUTS-TR, we apply the well-known Canny edge detector [85] to the ground truth object mask maps to obtain the corresponding edge maps. We evaluate BCNet using five SOD datasets, including ECSSD [86], PASCALS [87], OMRON [32], [33], DUTS (DUTS-TE) [84], and HKU-IS [88]. ECSSD contains 1000 images that are semantically

Conclusion

We have proposed BCNet, a real-time end-to-end trainable scheme for edge-guided salient object detection. Extensive experiments are conducted to compare BCNet with 17 state-of-the-art deep models, including several latest edge-guided ones. BCNet shows comparable or superior performance against these models in terms of both accuracy and speed on five challenging benchmark datasets. The effectiveness of the CSM and BFF modules is also validated by extensive ablation studies. In the future, we

CRediT authorship contribution statement

Bo Dong: Conceptualization, Investigation, Software, Writing - original draft. Yan Zhou: Data curation, Methodology, Software. Chuanfei Hu: Methodology, Software. Keren Fu: Project administration, Supervision, Writing - review & editing. Geng Chen: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the NSFC, under No. 61703077, 61773270, 61971005, the Fundamental Research Funds for the Central Universities No. YJ201755, and the Sichuan Science and Technology Major Projects (2018GZDZX0029).

Bo Dong is currently an undergraduate student majoring in automation with the University of Shanghai for Science and Technology, Shanghai, China. His current research interests include salient object detection and deep learning.

References (100)

  • M.-M. Cheng et al.

    Repfinder: finding approximately repeated scene elements for image editing

    ACM Trans. Graph.

    (2010)
  • K. Fu et al.

    Normalized cut-based saliency detection by adaptive multi-level region merging

    IEEE Trans. Image Process.

    (2015)
  • K. Fu et al.

    Saliency detection by fully learning a continuous conditional random field

    IEEE Trans. Multimedia

    (2017)
  • F. Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: contrast based filtering for salient region...
  • W.-C. Tu et al.

    Real-time salient object detection with a minimum spanning tree

  • C. Hu, Y. Wang, An efficient cnn model based on object-level attention mechanism for casting defects detection on...
  • W. Wang et al.

    Correspondence driven saliency transfer

    IEEE Trans. Image Process.

    (2016)
  • X. Zhang et al.

    Progressive attention guided recurrent network for salient object detection

  • Z. Wu et al.

    Cascaded partial decoder for fast and accurate salient object detection

  • L. Zhang et al.

    A bi-directional message passing model for salient object detection

  • W. Wang, J. Shen, M.-M. Cheng, L. Shao, An iterative and cooperative top-down and bottom-up inference network for...
  • N. Liu et al.

    Dhsnet: deep hierarchical saliency network for salient object detection

  • Z. Luo et al.

    Non-local deep features for salient object detection

  • X. Qin et al.

    Basnet: boundary-aware salient object detection

  • Y. Wang et al.

    Focal boundary guided salient object detection

    IEEE Trans. Image Process.

    (2019)
  • J.-J. Liu et al.

    A simple pooling-based design for real-time salient object detection

  • W. Wang et al.

    Salient object detection with pyramid attention and salient edges

  • W. Wang et al.

    Salient object detection driven by fixation prediction

  • J.-X. Zhao et al.

    Egnet: edge guidance network for salient object detection

  • S. Woo, J. Park, J.-Y. Lee, I. So Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European...
  • J. Hu et al.

    Squeeze-and-excitation networks

  • J. Liang, J. Zhou, X. Bai, Y. Qian, Salient object detection in hyperspectral imagery, in: 2013 IEEE International...
  • C. Yang et al.

    Saliency detection via graph-based manifold ranking

  • L. Zhang et al.

    Ranking saliency

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • D.-P. Fan et al.

    Salient objects in clutter: bringing salient object detection to the foreground

  • J. Han et al.

    Advanced deep-learning techniques for salient and category-specific object detection: a survey

    IEEE Signal Process. Mag.

    (2018)
  • D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, M.-M. Cheng, Rethinking RGB-D salient object detection: models, datasets, and...
  • K. Fu et al.

    Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection

  • J. Zhang et al.

    Uc-net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders

  • J.-X. Zhao et al.

    Contrast prior and fluid pyramid integration for rgbd salient object detection

  • Z. Zhang, Z. Lin, J. Xu, W. Jin, S.-P. Lu, D.-P. Fan, Bilateral attention network for rgb-d salient object detection,...
  • D.-P. Fan, Y. Zhai, A. Borji, J. Yang, L. Shao, Bbs-net: Rgb-d salient object detection with a bifurcated backbone...
  • Y. Zhai, D.-P. Fan, J. Yang, A. Borji, L. Shao, J. Han, L. Wang, Bifurcated backbone strategy for rgb-d salient object...
  • T. Zhou, D.-P. Fan, M.-M. Cheng, J. Shen, L. Shao, Rgb-d salient object detection: a survey, Computational visual...
  • K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, J. Shen, C. Zhu, Siamese network for rgb-d salient object detection and beyond,...
  • M. Zhang, J. Li, J. WEI, Y. Piao, H. Lu, Memory-oriented decoder for light field salient object detection, in: Advances...
  • T. Wang et al.

    Deep learning for light field saliency detection

  • Y. Jiang, T. Zhou, G.-P. Ji, K. Fu, Q. Zhao, D.-P. Fan, Light field salient object detection: a review and benchmark,...
  • Y. Zeng et al.

    Towards high-resolution salient object detection

  • D.-P. Fan et al.

    Shifting more attention to video salient object detection

  • Cited by (26)

    • MEANet: Multi-modal edge-aware network for light field salient object detection

      2022, Neurocomputing
      Citation Excerpt :

      Accurate boundaries are essential for high-quality segmentation maps, since SOD/semantic segmentation is a pixel-wise segmentation task. Recently, edge-aware models are drawing increasing research attention in the RGB/RGB-D SOD as well as semantic segmentation fields, and numerous effective models have been proposed [14,45,15,16,52,53,31,28]. In the RGB SOD field, motivated by the logical interrelations between binary segmentation and edge maps, Wu et al. [52] proposed a stacked cross refinement network to generate saliency maps with accurate boundaries.

    • Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection

      2022, Neurocomputing
      Citation Excerpt :

      Traditional SOD methods are mainly based on hand-crafted features, such as color, contrast, context, background prior, etc. With the rapid development of deep learning, CNN-based SOD methods [8–11,60–63] have been proposed and achieved state-of-the-art performance. Especially, fully convolutional network (FCN) become the mainstream of SOD task.

    • Human-related anomalous event detection via spatial-temporal graph convolutional autoencoder with embedded long short-term memory network

      2022, Neurocomputing
      Citation Excerpt :

      In [30], the normal activities of each spatial-temporal block were learnt based on the mixture of dynamic textures (MDT) and the outliers with respect to the model were detected as anomalies. Nowadays, deep neural networks (DNNs) have been applied in various computer vision tasks and achieved remarkable performances due to its capability to extract high-level features, such as object detection [31,32], action recognition [33,34] and semantic segmentation [35,36]. Researches correspondingly investigate to employ deep learning networks to address the anomalous event detection problem.

    • Multi-pathway feature integration network for salient object detection

      2021, Neurocomputing
      Citation Excerpt :

      As a result, traditional methods are generally limited to salient object detection in simple scenes. Many studies [4–6] have shown that convolutional neural networks (CNNs) trained using image samples can extract rich semantic features automatically. These deep features represent the diverse characteristics of objects from different perspectives.

    View all citing articles on Scopus

    Bo Dong is currently an undergraduate student majoring in automation with the University of Shanghai for Science and Technology, Shanghai, China. His current research interests include salient object detection and deep learning.

    Yan Zhou is currently an undergraduate student majoring in automation with the University of Shanghai for Science and Technology, Shanghai, China. Her current research interests include computer vision and deep learning.

    Chuanfei Hu received his B.S. degree from Jiangsu Uni- versity of Science and Technology, Zhenjiang, China. He is currently working toward the M.S. degree in control engineering at University of Shanghai for Science and Technology, Shanghai, China. His research interests include computer vision and applications of deep learning.

    Keren Fu received the dual Ph.D. degrees from Shanghai Jiao Tong University, Shanghai, China, and Chalmers University of Technology, Gothenburg, Sweden, under the joint supervision of Prof. Jie Yang and Prof. Irene Yu-Hua Gu. He is currently a research associate professor with College of Computer Science, Sichuan University, Chengdu, China. His current research interests include visual computing, saliency analysis, and machine learning.

    Geng Chen is a research scientist at the Inception Institute of Artificial Intelligence, UAE. He received his Ph.D. from Northwestern Polytechnical University, China, in 2016. He was a postdoctoral research associate at the University of North Carolina at Chapel Hill, USA, from 2016 to 2019. He has published more than 40 papers in peer-reviewed international conferences proceedings and journals. His research interests lie in geometric deep learning and medical image analysis.

    1

    Equal contributions.

    View full text