A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.06811
Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang

Existing RGB-D salient object detection (SOD) approaches concentrate on the cross-modal fusion between the RGB stream and the depth stream. They do not deeply explore the effect of the depth map itself. In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model. We tactfully utilize depth information from two perspectives: (1) Overcoming the incompatibility problem caused by the great difference between modalities, we build a single stream encoder to achieve the early fusion, which can take full advantage of ImageNet pre-trained backbone model to extract rich and discriminative features. (2) We design a novel depth-enhanced dual attention module (DEDA) to efficiently provide the fore-/back-ground branches with the spatially filtered features, which enables the decoder to optimally perform the middle fusion. Besides, we put forward a pyramidally attended feature extraction module (PAFE) to accurately localize the objects of different scales. Extensive experiments demonstrate that the proposed model performs favorably against most state-of-the-art methods under different evaluation metrics. Furthermore, this model is 55.5\% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 \times 384$ image.

中文翻译：

用于鲁棒和实时 RGB-D 显着对象检测的单流网络

现有的 RGB-D 显着对象检测 (SOD) 方法专注于 RGB 流和深度流之间的跨模态融合。他们没有深入探索深度图本身的效果。在这项工作中，我们设计了一个单流网络，直接使用深度图来指导RGB和深度之间的早期融合和中间融合，节省了深度流的特征编码器，实现了轻量级和实时模型。我们从两个角度巧妙地利用了深度信息：（1）克服了模态之间巨大差异导致的不兼容问题，我们构建了一个单流编码器来实现早期融合，可以充分利用 ImageNet 预训练的主干模型来提取丰富而有区别的特征。(2) 我们设计了一种新颖的深度增强双重注意模块 (DEDA)，以有效地为前景/背景分支提供空间过滤特征，从而使解码器能够最佳地执行中间融合。此外，我们提出了一个金字塔参与特征提取模块（PAFE）来准确定位不同尺度的对象。大量实验表明，在不同的评估指标下，所提出的模型与大多数最先进的方法相比表现良好。此外，该模型比当前最轻的模型轻 55.5%，并且在处理 384 美元 \times 384 美元的图像时以 32 FPS 的实时速度运行。我们提出了一个金字塔参与的特征提取模块（PAFE）来准确定位不同尺度的对象。大量实验表明，在不同的评估指标下，所提出的模型与大多数最先进的方法相比表现良好。此外，该模型比当前最轻的模型轻 55.5%，并且在处理 384 美元 \times 384 美元的图像时以 32 FPS 的实时速度运行。我们提出了一个金字塔参与的特征提取模块（PAFE）来准确定位不同尺度的对象。大量实验表明，在不同的评估指标下，所提出的模型与大多数最先进的方法相比表现良好。此外，该模型比当前最轻的模型轻 55.5%，并且在处理 384 美元 \times 384 美元的图像时以 32 FPS 的实时速度运行。

更新日期：2020-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>