当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Co-saliency Detection via Stacked Autoencoder-enabled Fusion and Self-trained CNNs
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tmm.2019.2936803
Chung-Chi Tsai , Kuang-Jui Hsu , Yen-Yu Lin , Xiaoning Qian , Yung-Yu Chuang

Image co-saliency detection via fusion-based or learning-based methods faces cross-cutting issues. Fusion-based methods often combine saliency proposals using a majority voting rule. Their performance hence highly depends on the quality and coherence of individual proposals. Learning-based methods typically require ground-truth annotations for training, which are not available for co-saliency detection. In this work, we present a two-stage approach to address these issues jointly. At the first stage, an unsupervised deep learning model with stacked autoencoder (SAE) is proposed to evaluate the quality of saliency proposals. It employs latent representations for image foregrounds, and auto-encodes foreground consistency and foreground-background distinctiveness in a discriminative way. The resultant model, SAE-enabled fusion (SAEF), can combine multiple saliency proposals to yield a more reliable saliency map. At the second stage, motivated by the fact that fusion often leads to over-smoothed saliency maps, we develop self-trained convolutional neural networks (STCNN) to alleviate this negative effect. STCNN takes the saliency maps produced by SAEF as inputs. It propagates information from regions of high confidence to those of low confidence. During propagation, feature representations are distilled, resulting in sharper and better co-saliency maps. Our approach is comprehensively evaluated on three benchmarks, including MSRC, iCoseg, and Cosal2015, and performs favorably against the state-of-the-arts. In addition, we demonstrate that our method can be applied to object co-segmentation and object co-localization, achieving the state-of-the-art performance in both applications.

中文翻译:

通过 Stacked Autoencoder-enabled Fusion 和自训练 CNN 进行深度共显着性检测

通过基于融合或基于学习的方法进行图像共显着性检测面临着交叉问题。基于融合的方法通常使用多数投票规则结合显着性提议。因此,它们的性能在很大程度上取决于单个提案的质量和连贯性。基于学习的方法通常需要用于训练的真实注释,而这些注释不可用于共显着性检测。在这项工作中,我们提出了一个两阶段的方法来共同解决这些问题。在第一阶段,提出了具有堆叠自动编码器(SAE)的无监督深度学习模型来评估显着性建议的质量。它采用图像前景的潜在表示,并以有区别的方式自动编码前景一致性和前景-背景独特性。由此产生的模型,支持 SAE 的融合 (SAEF),可以组合多个显着性提议以产生更可靠的显着性图。在第二阶段,由于融合经常导致过度平滑的显着图,我们开发了自训练卷积神经网络 (STCNN) 来减轻这种负面影响。STCNN 将 SAEF 生成的显着图作为输入。它将信息从高置信度区域传播到低置信度区域。在传播过程中,特征表示被提炼,从而产生更清晰和更好的共显着图。我们的方法在三个基准(包括 MSRC、iCoseg 和 Cosal2015)上进行了综合评估,并且与最先进的技术相比表现良好。此外,我们证明了我们的方法可以应用于对象共分割和对象共定位,
更新日期:2020-04-01
down
wechat
bug