A contextual conditional random field network for monocular depth estimation,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A contextual conditional random field network for monocular depth estimation
Image and Vision Computing ( IF 4.2 ) Pub Date : 2020-04-23 , DOI: 10.1016/j.imavis.2020.103922
Jun Liu , Qing Li , Rui Cao , Wenming Tang , Guoping Qiu

Monocular depth estimation plays a crucial role in understanding 3D scene geometry and is a challenging computer vision task. Recently, deep convolutional neural networks have been applied to solve this problem. However, existing methods either directly exploiting RGB pixels which can introduce much noise into the depth map or utilizing over smoothed internal representation features which can cause blur in the depth map. In this paper, we propose a contextual CRF network (CCN) to tackle these issues. The new CCN adopts the popular encoder-decoder architecture with a new contextual CRF module (CCM) which is guided by the depth features and regularizes the information flow from the encoder layer to the corresponding layer in the decoder, thus can reduce the mismatch between the RGB pixel and the depth map cue while at the same time retain detail features to output a fine-grained depth map. Moreover, we propose a depth-guided loss function which pays a balanced attention to near and far pixels thus addressing the long-tailed distribution of depth information. We have conducted extensive experiments on three public datasets for monocular depth estimation. Results demonstrate that our proposed CCN achieves superior performances in terms of visual quality and competitive quantitative results when compared with state-of-the-art methods.

中文翻译：

用于单眼深度估计的上下文条件随机场网络

单眼深度估计在理解3D场景几何中起着至关重要的作用，并且是一项具有挑战性的计算机视觉任务。最近，深度卷积神经网络已被应用来解决这个问题。但是，现有方法要么直接利用可能将很多噪声引入深度图的RGB像素，要么利用可能导致深度图模糊的过度平滑的内部表示特征。在本文中，我们提出了一个上下文CRF网络（CCN）来解决这些问题。新的CCN采用流行的编码器-解码器体系结构，并带有一个新的上下文CRF模块（CCM），该模块以深度特征为导向，并规范了从编码器层到解码器中相应层的信息流，因此可以减少RGB像素和深度图提示之间的不匹配，同时保留细节特征以输出细粒度的深度图。此外，我们提出了一种深度引导损失函数，该函数对近像素和远像素给予了均衡的关注，从而解决了深度信息的长尾分布问题。我们已经对三个公共数据集进行了单眼深度估计的广泛实验。结果表明，与最先进的方法相比，我们提出的CCN在视觉质量和竞争性定量结果方面均具有出色的性能。我们已经对三个公共数据集进行了单眼深度估计的广泛实验。结果表明，与最先进的方法相比，我们提出的CCN在视觉质量和竞争性定量结果方面均具有出色的性能。我们已经对三个公共数据集进行了单眼深度估计的广泛实验。结果表明，与最先进的方法相比，我们提出的CCN在视觉质量和竞争性定量结果方面均具有出色的性能。

更新日期：2020-04-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11