当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving RGB-D Salient Object Detection via Modality-Aware Decoder
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 9-16-2022 , DOI: 10.1109/tip.2022.3205747
Mengke Song 1 , Wenfeng Song 2 , Guowei Yang 3 , Chenglizhao Chen 1
Affiliation  

Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns – typically the combination of selective characteristics and its variations, are too highly dependent on the network’s non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder’s design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.

中文翻译:


通过模态感知解码器改进 RGB-D 显着目标检测



大多数现有的RGB-D显着目标检测(SOD)方法主要关注跨模态和跨级别的显着性融合,已被证明是高效且有效的。然而,这些方法仍然有一个关键的局限性,即它们的融合模式——通常是选择性特征及其变化​​的组合,过于依赖网络的非线性适应性。在此类方法中,考虑中间特征切片,单独制定 RGB 和 D(深度)之间的平衡,但可能无法正确学习模态级别的关系。最佳的 RGB-D 组合根据 RGB-D 场景而有所不同,并且确切的互补状态通常由多个模态级因素决定,例如 D 质量、RGB 场景的复杂性以及它们之间的协调程度。因此,考虑到现有的方法,他们可能很难实现进一步的性能突破,因为他们的方法属于一些对模态敏感度较低的方法。为了解决这个问题,本文提出了模态感知解码器(MaD)。关键的技术创新包括一系列特征嵌入、模态推理以及特征反投影和收集策略,所有这些都将广泛使用的多尺度和多级解码过程升级为模态感知。我们的 MaD 实现了优于其他最先进 (SOTA) 模型的竞争性能,而无需在解码器设计中使用任何花哨的技巧。代码和结果将在 https://github.com/MengkeSong/MaD 上公开提供。
更新日期:2024-08-28
down
wechat
bug