当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-05-03 , DOI: 10.1109/tnnls.2020.2996406
Deng-Ping Fan , Zheng Lin , Zhao Zhang , Menglong Zhu , Ming-Ming Cheng

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of ~1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and background s; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D3Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D3Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D3Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

中文翻译:

重新思考 RGB-D 显着目标检测:模型、数据集和大规模基准。

近年来,RGB-D 信息在显着目标检测 (SOD) 中的应用得到了广泛的探索。然而,在现实世界人类活动场景中使用 RGB-D 对 SOD 进行建模的努力相对较少。在本文中,我们通过对 RGB-D SOD 做出以下贡献来填补这一空白:1) 我们仔细收集了一个新的显着人物 (SIP) 数据集,该数据集由覆盖不同现实世界场景的约 1 K 高分辨率图像组成从各种视点、姿势、遮挡、照明和背景;2)我们进行了大规模(也是迄今为止最全面的)基准比较当代方法,这在该领域长期以来一直缺失,可以作为未来研究的基准,我们系统总结了 32 个流行模型,并在 7 个数据集上评估了 32 个模型中的 18 个部分,总共包含约 97k 个图像;3)我们提出了一种简单的通用架构,称为深度净化器网络(D3Net)。它由深度净化器单元(DDU)和三流特征学习模块(FLM)组成,分别执行低质量深度图过滤和跨模态特征学习。这些组件形成一个嵌套结构,并经过精心设计以共同学习。D3Net 在所考虑的所有五个指标上都超过了任何先前竞争者的表现,因此成为推进该领域研究的强大模型。我们还展示了 D3Net 可用于有效地从真实场景中提取显着对象掩码,在单个 GPU 上以 65 帧/秒的速度实现有效的背景更改应用程序。所有显着性图、我们新的 SIP 数据集、D3Net 模型和评估工具都可以在 https://github.com/DengPingFan/D3NetBenchmark 上公开获得。
更新日期:2020-06-03
down
wechat
bug