当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contextualized CNN for Scene-Aware Depth Estimation from Single RGB Image
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-05-01 , DOI: 10.1109/tmm.2019.2941776
Wenfeng Song , Shuai Li , Ji Liu , Aimin Hao , Qinping Zhao , Hong Qin

Directly benefited from deep learning techniques, depth estimation from single image has gained great momentum in recent years. However, most of the existing approaches treat depth prediction as an isolated problem without taking into consideration high-level semantic context information, which results in inefficient utilization of training dataset and unavoidably requires a large number of captured depth data during the training phase. To ameliorate, this paper develops a novel scene-aware contextualized convolution neural network (CCNN), which characterizes the semantic context relationship at the class-level and refines depth at the pixel-level. Our newly-proposed CCNN is built upon the intrinsic exploitation of context-dependent depth association, including inner-object continuous depth and inter-object depth change priors nearby. Specifically, rather than conducting regression on depth in single CNN, we make the first attempt to integrate both class-level and pixel-level conditional random fields (CRFs) based probabilistic graphical model into the powerful CNN framework to simultaneously learn different-level features within the same CNN layer. With our CCNN, the former model will guide the latter one to learn the contextualized RGB-Depth mapping. Hence, CCNN has desirable properties in both class-level integrity and pixel-level discrimination, which makes it ideal to share such two-level convolutional features in parallel during the end-to-end training with the commonly-used back-propagation algorithm. We conduct extensive experiments and comprehensive evaluations on public benchmarks involving various indoor and outdoor scenes, and all the experiments confirm that, our method outperforms the state-of-the-art depth estimation methods, especially for the cases where only small-scale training data are readily available.

中文翻译:

用于从单个 RGB 图像进行场景感知深度估计的上下文 CNN

直接受益于深度学习技术,单幅图像的深度估计近年来获得了很大的发展。然而,现有的大多数方法将深度预测视为孤立的问题,而没有考虑高级语义上下文信息,这导致训练数据集的利用效率低下,并且在训练阶段不可避免地需要大量捕获的深度数据。为了改进,本文开发了一种新颖的场景感知上下文卷积神经网络(CCNN),它在类级别表征语义上下文关系并在像素级别细化深度。我们新提出的 CCNN 建立在依赖于上下文的深度关联的内在开发之上,包括内部对象连续深度和附近对象间深度变化先验。具体来说,我们不是在单个 CNN 中对深度进行回归,而是首次尝试将基于类级和像素级条件随机场 (CRF) 的概率图模型集成到强大的 CNN 框架中,以同时学习同一网络中的不同级别特征。 CNN层。使用我们的 CCNN,前一个模型将指导后一个模型学习上下文化的 RGB-Depth 映射。因此,CCNN 在类级完整性和像素级区分方面都具有理想的特性,这使得在使用常用反向传播算法进行端到端训练期间并行共享此类两级卷积特征非常理想。我们对涉及各种室内外场景的公共基准进行了广泛的实验和综合评估,所有的实验都证实,
更新日期:2020-05-01
down
wechat
bug