当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection
Neurocomputing ( IF 6 ) Pub Date : 2021-09-03 , DOI: 10.1016/j.neucom.2021.08.116
Fei Li 1 , Jiangbin Zheng 1, 2 , Yuan-fang Zhang 2, 3 , Nian Liu 4 , Wenjing Jia 3
Affiliation  

Effective exploration of useful contextual information in multi-modal images is an essential task in salient object detection. Nevertheless, the existing methods based on the early-fusion or the late-fusion schemes cannot address this problem as they are unable to effectively resolve the distribution gap and information loss. In this paper, we propose an adaptive multi-level deformable fusion network (AMDFNet) to exploit the cross-modality information. We use a cross-modality deformable convolution module to dynamically adjust the boundaries of salient objects by exploring the extra input from another modality. This enables incorporating the existing features and propagating more contexts so as to strengthen the model’s ability to perceiving scenes. To accurately refine the predicted maps, a multi-scaled feature refinement module is proposed to enhance the intermediate features with multi-level prediction in the decoder part. Furthermore, we introduce a selective cross-modality attention module in the fusion process to exploit the attention mechanism. This module captures dense long-range cross-modality dependencies from a multi-modal hierarchical feature’s perspective. This strategy enables the network to select more informative details and suppress the contamination caused by the negative depth maps. Experimental results on eight benchmark datasets demonstrate the effectiveness of the components in our proposed model, as well as the overall saliency model.



中文翻译:

AMDFNet:用于RGB-D显着性检测的自适应多级可变形融合网络

在多模态图像中有效探索有用的上下文信息是显着目标检测的一项基本任务。然而,现有的基于早期融合或后期融合方案的方法无法解决这个问题,因为它们无法有效解决分布差距和信息丢失问题。在本文中,我们提出了一种自适应多级可变形融合网络(AMDFNet)来利用跨模态信息。我们使用跨模态可变形卷积模块通过探索来自另一种模态的额外输入来动态调整显着对象的边界。这使得能够结合现有特征并传播更多上下文,从而增强模型感知场景的能力。为了准确细化预测的地图,提出了一个多尺度特征细化模块,以在解码器部分通过多级预测来增强中间特征。此外,我们在融合过程中引入了选择性跨模态注意力模块以利用注意力机制。该模块从多模态分层特征的角度捕获密集的远程跨模态依赖性。这种策略使网络能够选择更多信息细节并抑制负深度图造成的污染。八个基准数据集的实验结果证明了我们提出的模型中组件以及整体显着性模型的有效性。我们在融合过程中引入了选择性跨模态注意力模块来利用注意力机制。该模块从多模态分层特征的角度捕获密集的远程跨模态依赖性。这种策略使网络能够选择更多信息细节并抑制负深度图造成的污染。八个基准数据集的实验结果证明了我们提出的模型中组件以及整体显着性模型的有效性。我们在融合过程中引入了选择性跨模态注意力模块来利用注意力机制。该模块从多模态分层特征的角度捕获密集的远程跨模态依赖性。这种策略使网络能够选择更多信息细节并抑制负深度图造成的污染。八个基准数据集的实验结果证明了我们提出的模型中组件以及整体显着性模型的有效性。

更新日期:2021-09-16
down
wechat
bug