BCNet: Bidirectional collaboration network for edge-guided salient object detection

doi:10.1016/j.neucom.2021.01.034

Neurocomputing

Volume 437, 21 May 2021, Pages 58-71

https://doi.org/10.1016/j.neucom.2021.01.034 Get rights and content

Abstract

The boundary quality is a key factor determining the success of accurate salient object detection (SOD). A number of edge-guided SOD methods have been proposed to improve the boundary quality, but shown unsatisfactory performance due to the lack of a comprehensive consideration of multi-level feature fusion and multi-type feature aggregation. To resolve this issue, we propose a novel Bidirectional Collaboration Network (BCNet), which integrates effective multi-level feature fusion and multi-type feature aggregation into a unified edge-guided SOD framework. Specifically, we first utilize multiple Consistency Saliency Maximization (CSM) modules to propagate the highest level semantic representations in a top-down progressive pathway to generate both global edge representations and a series of region representations. Multiple Bounded Feature Fusion (BFF) modules are then utilized to refine the region features with the edge features. The CSM and BFF modules enable robust multi-level feature fusion and multi-type feature aggregation with only little extra computation, which allows a high computational efficiency. Finally, BCNet is jointly trained with edge and region losses in an end-to-end manner. Extensive comparisons are conducted with 17 state-of-the-art methods on five challenging benchmark datasets. Thanks to the use of CSM and BFF modules, our BCNet outperforms existing deep learning based SOD methods, including the latest edge-guided ones, in terms of both detection accuracy and processing speed.

Introduction

Salient object detection (SOD), acting as a powerful pre-processing tool in numerous computer vision tasks, mimics the human visual attention mechanism for identifying attention-grabbing objects from natural images. It has a large number of applications, such as autonomous driving [1], robot navigation [2], visual tracking [3], image retrieval [4], aesthetics assessment [5], and content-aware image editing [6]. Inspired by the progress in perceptual psychology, early models detect salient objects using heuristic priors[7], [8] and hand-crafted features such as contrast [9] and distance transformation [10]. However, their detection performance is seriously limited in complex scenarios. Recent works have demonstrated that deep learning techniques, especially the Convolutional Neural Networks (CNNs) [11], [12], [13], [14], are particularly good at understanding visual concepts by extracting semantic features from image regions, and have achieved remarkable performance [15], [16], [17], [18]. Despite their advantages, existing methods suffer from two major limitations. First, it is still challenging to detect entire salient objects from a complex background, even using deep-based methods. Second, most existing methods are unable to accurately detect the boundaries of salient objects.

To overcome these limitations, a number of methods have been proposed in recent years [19], [20], [21], [22], [23], [24], [25], [26]. For instance, fusing the multi-level features from low-level and high-level convolutional layers [20], [21], [27] improves the detection of objects from a complex background. In addition, resorting to additional edge guidance [24], [25], [26], [28] is able to improve the accuracy of boundaries. However, most of the existing solutions only focus on addressing one aspect of the limitations, while overlooking the other one. In addition, although edge-guided methods such as [24], [28] provide encouraging boundary quality, the aggregation of multi-type features, i.e., region and edge features, is achieved by the naive concatenation or element-wise addition/multiplication, which might be suboptimal and ineffective.

To this end, we propose a novel bidirectional collaboration network, called BCNet, which integrates effective multi-level feature fusion and multi-type feature aggregation into a unified SOD framework. BCNet utilizes the edge features to guide region features, which automatically discards low-quality features and highlights more edge details, as shown in Fig. 1. Specifically, we introduce a module, called Consistency Saliency Maximization (CSM), which is inspired by the spatial attention mechanism [29], and embed it to BCNet to mitigate the discrepancy in different levels of features for effective feature fusion. Fig. 1(d) and (e) show the edge feature map and the feature map generated by fusing shallow features (b) and high-level features (c) with our CSM module. As can be observed, after fusion, the entire objects become clear and background noise is suppressed. To improve the edge sharpness and accuracy, we introduce another module, called Bounded Feature Fusion (BFF), which is inspired by the Squeeze-Excitation block in [30], to aggregate the multi-type features provided by the CSM modules. Different from existing methods relying on simple concatenation or element-wise operations [24], [28], BFF utilizes effective feature re-weighting to sharpen the edges (Fig. 1(f)). Finally, BCNet is jointly trained with edge and region losses in an end-to-end manner. It is worth noting that CSM and BFF modules enable effective multi-level feature fusion and multi-type feature aggregation, but only inducing little extra computation, which allows effective real-time processing at 52fps. The main contributions of this paper are as follows:

•
We propose a novel bidirectional collaboration network BCNet for edge-guided SOD, which effectively addresses multi-level feature fusion and multi-type feature aggregation within a unified framework. Accordingly, two modules, called Consistency Saliency Maximization (CSM) and Bounded Feature Fusion (BFF), are introduced.

•
We construct a new bidirectional collaboration architecture for BCNet, where local region features are first organized in a top-down progressive pathway to propagate the highest level semantic representations, and then the global edge features are used to refine the obtained region features for final prediction.

•
Extensive comparisons are conducted with 17 state-of-the-art (SOTA) methods on five challenging benchmark datasets, demonstrating that the proposed BCNet performs favorably against the latest SOTA models in terms of both accuracy and speed. Notably, BCNet achieves real-time speed at 52fps, shown to be one of the fastest models comparing to SOTAs.

The rest of the paper is organized as follows. Section 2 describes related works on deep SOD, especially the edge-guided models. Section 3 describes the proposed BCNet in detail. Experimental results, performance evaluation and comparisons are presented in Section 4. Finally, conclusions are drawn in Section 5.

Section snippets

Related works

There are various types of SOD problems, such as RGB SOD [31], [23], [32], [33], [34], [35], RGB-D SOD [36], [37], [38], [39], [40], [41], [42], [43], [44], light field SOD [45], [46], [47], high-resolution SOD [48], video SOD (VSOD) [49], [50], 360 omnidirectional SOD [51], and co-salient object detection (Co-SOD) [52], [53], [54]. In addition, salient objects can also act as the negative samples of camouflaged objects [55], [56]. This paper focuses on the RGB SOD problem, which aims to detect

Bidirectional collaboration network

Bidirectional Collaboration Network (BCNet) is an edge-guided framework for SOD. It effectively resolves the limitations in existing edge-guided SOD methods by integrating the multi-level feature fusion and multi-type feature aggregation in a unified framework. We detail the BCNet in this section. Specifically, the overall structure is given in Section 3.1. The two key components in BCNet, i.e., CSM module and BFF module, are described in Section 3.2 Consistency saliency maximization (CSM)

Datasets and metrics

Similar to [82], [64], [28], [83], we train BCNet using the DUTS-TR [84] dataset containing 10553 images from various scenes. Since there is no available boundary annotation for DUTS-TR, we apply the well-known Canny edge detector [85] to the ground truth object mask maps to obtain the corresponding edge maps. We evaluate BCNet using five SOD datasets, including ECSSD [86], PASCALS [87], OMRON [32], [33], DUTS (DUTS-TE) [84], and HKU-IS [88]. ECSSD contains 1000 images that are semantically

Conclusion

We have proposed BCNet, a real-time end-to-end trainable scheme for edge-guided salient object detection. Extensive experiments are conducted to compare BCNet with 17 state-of-the-art deep models, including several latest edge-guided ones. BCNet shows comparable or superior performance against these models in terms of both accuracy and speed on five challenging benchmark datasets. The effectiveness of the CSM and BFF modules is also validated by extensive ablation studies. In the future, we

CRediT authorship contribution statement

Bo Dong: Conceptualization, Investigation, Software, Writing - original draft. Yan Zhou: Data curation, Methodology, Software. Chuanfei Hu: Methodology, Software. Keren Fu: Project administration, Supervision, Writing - review & editing. Geng Chen: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the NSFC, under No. 61703077, 61773270, 61971005, the Fundamental Research Funds for the Central Universities No. YJ201755, and the Sichuan Science and Technology Major Projects (2018GZDZX0029).

Bo Dong is currently an undergraduate student majoring in automation with the University of Shanghai for Science and Technology, Shanghai, China. His current research interests include salient object detection and deep learning.

References (100)

Z. Liu et al.
Deep layer guided network for salient object detection
Neurocomputing
(2020)
H. Shao et al.
Generative image inpainting via edge structure and color aware fusion
Signal Process. Image Commun.
(2020)
K. Fu et al.
Deepside: a general deep framework for salient object detection
Neurocomputing
(2019)
A. Dakhia et al.
A hybrid-backward refinement model for salient object detection
Neurocomputing
(2019)
K. Fu et al.
Spectral salient object detection
Neurocomputing
(2018)
Z. Zhang et al.
Instance-level segmentation for autonomous driving with deep densely connected mrfs
C. Craye, D. Filliat, J.-F. Goudou, Environment exploration for object-based visual saliency learning, in: 2016 IEEE...
S. Hong et al.
Online tracking by learning discriminative saliency map with convolutional neural network
J. He, J. Feng, X. Liu, T. Cheng, T.-H. Lin, H. Chung, S.-F. Chang, Mobile product search with bag of hash bits and...
W. Wang et al.
Deep cropping via attention box prediction and aesthetics assessment

M.-M. Cheng et al.

Repfinder: finding approximately repeated scene elements for image editing

ACM Trans. Graph.

(2010)

K. Fu et al.

Normalized cut-based saliency detection by adaptive multi-level region merging

IEEE Trans. Image Process.

(2015)

K. Fu et al.

Saliency detection by fully learning a continuous conditional random field

IEEE Trans. Multimedia

(2017)

F. Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: contrast based filtering for salient region...

W.-C. Tu et al.

Real-time salient object detection with a minimum spanning tree

C. Hu, Y. Wang, An efficient cnn model based on object-level attention mechanism for casting defects detection on...

W. Wang et al.

Correspondence driven saliency transfer

IEEE Trans. Image Process.

(2016)

X. Zhang et al.

Progressive attention guided recurrent network for salient object detection

Z. Wu et al.

Cascaded partial decoder for fast and accurate salient object detection

L. Zhang et al.

A bi-directional message passing model for salient object detection

W. Wang, J. Shen, M.-M. Cheng, L. Shao, An iterative and cooperative top-down and bottom-up inference network for...

N. Liu et al.

Dhsnet: deep hierarchical saliency network for salient object detection

Z. Luo et al.

Non-local deep features for salient object detection

X. Qin et al.

Basnet: boundary-aware salient object detection

Y. Wang et al.

Focal boundary guided salient object detection

IEEE Trans. Image Process.

(2019)

J.-J. Liu et al.

A simple pooling-based design for real-time salient object detection

W. Wang et al.

Salient object detection with pyramid attention and salient edges

W. Wang et al.

Salient object detection driven by fixation prediction

J.-X. Zhao et al.

Egnet: edge guidance network for salient object detection

S. Woo, J. Park, J.-Y. Lee, I. So Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European...

J. Hu et al.

Squeeze-and-excitation networks

J. Liang, J. Zhou, X. Bai, Y. Qian, Salient object detection in hyperspectral imagery, in: 2013 IEEE International...

C. Yang et al.

Saliency detection via graph-based manifold ranking

L. Zhang et al.

Ranking saliency

IEEE Trans. Pattern Anal. Mach. Intell.

(2016)

D.-P. Fan et al.

Salient objects in clutter: bringing salient object detection to the foreground

J. Han et al.

Advanced deep-learning techniques for salient and category-specific object detection: a survey

IEEE Signal Process. Mag.

(2018)

D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, M.-M. Cheng, Rethinking RGB-D salient object detection: models, datasets, and...

K. Fu et al.

Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection

J. Zhang et al.

Uc-net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders

J.-X. Zhao et al.

Contrast prior and fluid pyramid integration for rgbd salient object detection

Z. Zhang, Z. Lin, J. Xu, W. Jin, S.-P. Lu, D.-P. Fan, Bilateral attention network for rgb-d salient object detection,...

D.-P. Fan, Y. Zhai, A. Borji, J. Yang, L. Shao, Bbs-net: Rgb-d salient object detection with a bifurcated backbone...

Y. Zhai, D.-P. Fan, J. Yang, A. Borji, L. Shao, J. Han, L. Wang, Bifurcated backbone strategy for rgb-d salient object...

T. Zhou, D.-P. Fan, M.-M. Cheng, J. Shen, L. Shao, Rgb-d salient object detection: a survey, Computational visual...

K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, J. Shen, C. Zhu, Siamese network for rgb-d salient object detection and beyond,...

M. Zhang, J. Li, J. WEI, Y. Piao, H. Lu, Memory-oriented decoder for light field salient object detection, in: Advances...

T. Wang et al.

Deep learning for light field saliency detection

Y. Jiang, T. Zhou, G.-P. Ji, K. Fu, Q. Zhao, D.-P. Fan, Light field salient object detection: a review and benchmark,...

Y. Zeng et al.

Towards high-resolution salient object detection

D.-P. Fan et al.

Shifting more attention to video salient object detection

Cited by (26)

SMINet:Semantics-aware multi-level feature interaction network for surface defect detection
2023, Engineering Applications of Artificial Intelligence
To boost the product quality, numerous saliency-based surface defect detection methods have been devoted to the areas of industrial production, construction consumable, road construction. However, the existing salient object detection (SOD) methods not only consume a significant amount of computing resources but also fail to meet the detection efficiency requirements of enterprises. Therefore, this paper proposes a lightweight semantics-aware multi-level feature interaction network (SMINet), to address the above issues. In the encoder phase, we integrate multiple adjacent level features in the cross-layer feature fusion (CFF) module to alleviate the discrepancy between multi-scale features. In the decoder phase, we first employ the semantic-aware feature extraction (SFE) module to mine the location cues embedded in the high-level features. Afterwards, we introduce the detail-aware context attention (DCA) module based on the attention mechanism to recover more spatial details. Extensive experiments on four surface defect datasets validate that our SMINet outperforms the existing state-of-the-art methods.
MEANet: Multi-modal edge-aware network for light field salient object detection
2022, Neurocomputing
Citation Excerpt :
Accurate boundaries are essential for high-quality segmentation maps, since SOD/semantic segmentation is a pixel-wise segmentation task. Recently, edge-aware models are drawing increasing research attention in the RGB/RGB-D SOD as well as semantic segmentation fields, and numerous effective models have been proposed [14,45,15,16,52,53,31,28]. In the RGB SOD field, motivated by the logical interrelations between binary segmentation and edge maps, Wu et al. [52] proposed a stacked cross refinement network to generate saliency maps with accurate boundaries.
Abundant light field cues have been demonstrated helpful to boost performance of salient object detection (SOD) in challenging scenarios. However, the majority of existing deep light field SOD models focus on exploring spatial interrelations across focal slices and seldom consider boundary accuracy of salient objects, which inevitability limits the detection performance. Meanwhile, in addition to focal stacks, several other data forms/modalities can be derived simultaneously from the light field. Therefore existing UNet-like strategies widely adopted for a single modality may not be very suitable for this task. To address these issues, we propose a novel multi-modal edge-aware network for light field SOD, named MEANet. MEANet has two innovative components elaborately designed for this task, i.e., the cross-modal compensation (CMC) module and the multi-modal edge supervision (MES) module. CMC utilizes the attention mechanism to explore cross-modal complementarities and overcome the information loss of focal stack cues, whereas MES generates explicit object edges and edge features in order to progressively refine regional features to achieve edge-aware detection. Comprehensive evaluations on four benchmark datasets show that MEANet outperforms state-of-the-art light field and RGB-D/RGB SOD models, generating saliency maps with fine-grained accurate object boundaries effectively from multiple data inputs. The code and models are publicly available at https://github.com/jiangyao-scu/MEANet.
Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
2022, Neurocomputing
Citation Excerpt :
Traditional SOD methods are mainly based on hand-crafted features, such as color, contrast, context, background prior, etc. With the rapid development of deep learning, CNN-based SOD methods [8–11,60–63] have been proposed and achieved state-of-the-art performance. Especially, fully convolutional network (FCN) become the mainstream of SOD task.
RGB-based salient object detection (SOD) algorithms have shown good ability to segment salient objects from images, but the performance is still unsatisfactory when dealing with challenging scenes such as ambiguous object contours, low color contrasts between foreground and background. To overcome this problem, RGB-D or RGB-T SOD has been studied. However, they are currently usually treated as separate visual tasks. And most of them directly extract and fuse raw features from backbones. In this paper, we explore the potential commonalities between the two tasks and propose a novel end-to-end unified framework that can be used for both RGB-D and RGB-T SOD. The framework consists of three key components: multi-modal interactive attention (MIA) unit, joint attention guided cross-modal decoding (JAGCD) module, and multi-level feature progressive decoding (MFPD) module. Specifically, MIA units effectively capture rich multi-layered context features from each modality feature, which serve as a bridge between feature encoding and cross-modal decoding. Moreover, the proposed JAGCD and MFPD modules progressively integrate complementary features from multi-source features and different level of fusion features, respectively. To demonstrate the effectiveness of the proposed approach, we conduct comprehensive experiments not only on RGB-D but also on RGB-T saliency detection benchmark. Experimental results show that our approach outperforms other state-of-the-art methods and has good generalization. Moreover, the proposed framework can provide a potential solution for cross-modal complementary tasks. The code will be available at https://github.com/Liangyh18/MIA_DPD.
Human-related anomalous event detection via spatial-temporal graph convolutional autoencoder with embedded long short-term memory network
2022, Neurocomputing
Citation Excerpt :
In [30], the normal activities of each spatial-temporal block were learnt based on the mixture of dynamic textures (MDT) and the outliers with respect to the model were detected as anomalies. Nowadays, deep neural networks (DNNs) have been applied in various computer vision tasks and achieved remarkable performances due to its capability to extract high-level features, such as object detection [31,32], action recognition [33,34] and semantic segmentation [35,36]. Researches correspondingly investigate to employ deep learning networks to address the anomalous event detection problem.
Automatic detection of human-related anomalous events in surveillance videos is challenging, owing to unclear definition of anomalies and insufficiency of training data. Generally, the irregular human motion patterns can be regarded as human-related abnormal events. Therefore, we propose a novel method to operate directly on sequences of human skeleton graphs for discovering the normal patterns of human motion. The sequence of skeleton graphs is decomposed into two sub-components: global movement and local posture sequences. The global component is utilized to compute local component. The local component sequences are then input to our network for capturing normal spatial-temporal motion patterns of human skeleton. Our network is established on a Spatial-temporal Graph Convolutional Autoencoder (ST-GCAE) and embedded with Long Short-Term Memory (LSTM) network in hidden layers for exploring the temporal cues, which is thus called Spatial-temporal Graph Convolutional Autoencoder with Embedded Long Short-Term Memory Network (STGCAE-LSTM). Different from traditional autoencoder, STGCAE-LSTM owns a single-encoder-dual-decoder architecture, which is capable of reconstructing the input and predicting the unseen future simultaneously. Then, samples that deviate from normal patterns are detected as anomalies with fusion of reconstruction and prediction errors. Experimental results on four challenging datasets demonstrate advantages of our method over other state-of-the-art algorithms.
Multi-pathway feature integration network for salient object detection
2021, Neurocomputing
Citation Excerpt :
As a result, traditional methods are generally limited to salient object detection in simple scenes. Many studies [4–6] have shown that convolutional neural networks (CNNs) trained using image samples can extract rich semantic features automatically. These deep features represent the diverse characteristics of objects from different perspectives.
Saliency detection is a computer vision task that has been studied for a long time. Although the recent methods continuously improve detection accuracy, most of them do not consider the interactions of different feature layers sufficiently, and deep- and shallow-level features are not fused effectively. To combine the advantages of multi-layer features for detection, we propose a novel multi-pathway feature integration network (MFINet) by extracting and fusing information from multiple feature layers. Different from the top-down and bottom-up fusion styles, our proposed feature fusion is divided into two stages and three pathways are finally formed. All three pathways integrate multiple original backbone feature layers, where they integrate features in three different orders. The diversity of feature fusion compensates for the limitations of the top-down and bottom-up fusion approaches, effectively preventing the loss of useful information in the transmission process. In addition, we propose a cross-feature hierarchical fusion module. This module combines the saliency cues in two feature layers adaptively and reduces the representation of redundant information. Therefore, multiple such modules are embedded in the MFINet to enable the network to further learn salient object information accurately. To strengthen the response of boundary information in the salient object information, we also propose a boundary information enhancement strategy. Unlike existing methods that use an independent branch to extract boundary features, we directly capture boundary features from a boundary map. Experimental results on 5 benchmark datasets show that our method achieves better performance in different scenarios, consistently outperforming 12 state-of-the-art methods.
Fcfig-Net: Feature Complementary Fusion and Information-Guided Network for Rgb-D Salient Object Detection
2024, SSRN

View all citing articles on Scopus

Yan Zhou is currently an undergraduate student majoring in automation with the University of Shanghai for Science and Technology, Shanghai, China. Her current research interests include computer vision and deep learning.

Chuanfei Hu received his B.S. degree from Jiangsu Uni- versity of Science and Technology, Zhenjiang, China. He is currently working toward the M.S. degree in control engineering at University of Shanghai for Science and Technology, Shanghai, China. His research interests include computer vision and applications of deep learning.

Keren Fu received the dual Ph.D. degrees from Shanghai Jiao Tong University, Shanghai, China, and Chalmers University of Technology, Gothenburg, Sweden, under the joint supervision of Prof. Jie Yang and Prof. Irene Yu-Hua Gu. He is currently a research associate professor with College of Computer Science, Sichuan University, Chengdu, China. His current research interests include visual computing, saliency analysis, and machine learning.

Geng Chen is a research scientist at the Inception Institute of Artificial Intelligence, UAE. He received his Ph.D. from Northwestern Polytechnical University, China, in 2016. He was a postdoctoral research associate at the University of North Carolina at Chapel Hill, USA, from 2016 to 2019. He has published more than 40 papers in peer-reviewed international conferences proceedings and journals. His research interests lie in geometric deep learning and medical image analysis.

¹: Equal contributions.

View full text

BCNet: Bidirectional collaboration network for edge-guided salient object detection

Abstract

Introduction

Section snippets

Related works

Bidirectional collaboration network

Datasets and metrics

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Neurocomputing

Signal Process. Image Commun.

Neurocomputing

Neurocomputing

Neurocomputing

Instance-level segmentation for autonomous driving with deep densely connected mrfs

Online tracking by learning discriminative saliency map with convolutional neural network

Deep cropping via attention box prediction and aesthetics assessment

Repfinder: finding approximately repeated scene elements for image editing

ACM Trans. Graph.

Normalized cut-based saliency detection by adaptive multi-level region merging

IEEE Trans. Image Process.

Saliency detection by fully learning a continuous conditional random field

IEEE Trans. Multimedia

Real-time salient object detection with a minimum spanning tree

Correspondence driven saliency transfer

IEEE Trans. Image Process.

Progressive attention guided recurrent network for salient object detection

Cascaded partial decoder for fast and accurate salient object detection

A bi-directional message passing model for salient object detection

Dhsnet: deep hierarchical saliency network for salient object detection

Non-local deep features for salient object detection

Basnet: boundary-aware salient object detection

Focal boundary guided salient object detection

IEEE Trans. Image Process.

A simple pooling-based design for real-time salient object detection

Salient object detection with pyramid attention and salient edges

Salient object detection driven by fixation prediction

Egnet: edge guidance network for salient object detection

Squeeze-and-excitation networks

Saliency detection via graph-based manifold ranking

Ranking saliency

IEEE Trans. Pattern Anal. Mach. Intell.

Salient objects in clutter: bringing salient object detection to the foreground

Advanced deep-learning techniques for salient and category-specific object detection: a survey

IEEE Signal Process. Mag.

Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection

Uc-net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders

Contrast prior and fluid pyramid integration for rgbd salient object detection

Deep learning for light field saliency detection

Towards high-resolution salient object detection

Shifting more attention to video salient object detection