Boundary-induced and scene-aggregated network for monocular depth prediction,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Boundary-induced and scene-aggregated network for monocular depth prediction
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-02-18 , DOI: 10.1016/j.patcog.2021.107901
Feng Xue , Junfeng Cao , Yu Zhou , Fei Sheng , Yankai Wang , Anlong Ming

Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2) The low-level features are insufficient utilized, which makes it even harder to estimate the depth near the edge with sudden depth change. To tackle these two issues, we propose the Boundary-induced and Scene-aggregated network (BS-Net). In this network, the Depth Correlation Encoder (DCE) is first designed to obtain the contextual correlations between the regions in an image, and perceive the farthest region by considering the correlations. Meanwhile, the Bottom-Up Boundary Fusion (BUBF) module is designed to extract accurate boundary that indicates depth change. Finally, the Stripe Refinement module (SRM) is designed to refine the dense depth induced by the boundary cue, which improves the boundary accuracy of the predicted depth. Several experimental results on the NYUD v2 dataset and the iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach. And the SUN-RGBD dataset is employed to evaluate the generalization of our method. Code is available at https://github.com/XuefengBUPT/BS-Net.

中文翻译：

边界诱导和场景聚合的网络，用于单眼深度预测

单眼深度预测是场景理解中的重要任务。它旨在预测单个RGB图像的密集深度。随着深度学习的发展，这项任务的性能有了很大的提高。但是，仍然有两个问题尚未解决：（1）深度特征对场景中最远的区域进行了错误编码，从而导致预测深度的3D结构变形；（2）低层特征的利用不充分，这使得难以估计深度突然变化的边缘附近的深度。为了解决这两个问题，我们提出了边界诱导和场景聚合网络（BS-Net）。在该网络中，首先设计深度相关编码器（DCE），以获取图像中各区域之间的上下文相关性，并通过考虑相关性来感知最远的区域。同时，自底向上边界融合（BUBF）模块旨在提取表示深度变化的准确边界。最后，设计“条纹细化”模块（SRM）来细化边界提示引起的密集深度，从而提高了预测深度的边界精度。在NYUD v2数据集和iBims-1数据集上的一些实验结果说明了该方法的最新性能。然后使用SUN-RGBD数据集来评估我们方法的推广性。可以在https://github.com/XuefengBUPT/BS-Net上找到代码。在NYUD v2数据集和iBims-1数据集上的一些实验结果说明了该方法的最新性能。然后使用SUN-RGBD数据集来评估我们方法的推广性。可以在https://github.com/XuefengBUPT/BS-Net上找到代码。在NYUD v2数据集和iBims-1数据集上的一些实验结果说明了该方法的最新性能。然后使用SUN-RGBD数据集来评估我们方法的推广性。可以在https://github.com/XuefengBUPT/BS-Net上找到代码。

更新日期：2021-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11