A lightweight network for monocular depth estimation with decoupled body and edge supervision,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A lightweight network for monocular depth estimation with decoupled body and edge supervision
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-07-27 , DOI: 10.1016/j.imavis.2021.104261
Usman Ali ₁ , Bayram Bayramli ₁ , Tamam Alsarhan ₁ , Hongtao Lu ₁

Affiliation

Learning depth from a single image is a challenging task in computer vision. Many recent works on monocular depth estimation explore increasingly large convolutional neural networks to learn monocular cues implicitly. Such methods may fail to generalize well around object boundaries as large networks tend to distort the fine details (such as edges and corners) in low-resolution layers, leading to a poor depth prediction near object edges. To reduce depth loss near object boundaries, this paper proposes to explicitly decouple depth features for the body and edges of objects corresponding to low and high-frequency regions of an image, respectively. To this end, we learn a flow field to warp depth features into consistent body features and residual edge features. Afterward, decoupled supervision is employed on both sets of features to learn body and edge depth maps explicitly. Moreover, we also propose a lightweight encoder-decoder network that efficiently combines features at multiple scales to alleviate the loss of fine details in the final feature map. Extensive experiments on NYUD-v2 and KITTI datasets demonstrate that our proposed lightweight network with depth decoupling performs comparably to state-of-the-art methods while drastically reducing the number of parameters.

中文翻译：

具有解耦主体和边缘监督的用于单目深度估计的轻量级网络

从单个图像中学习深度是计算机视觉中的一项具有挑战性的任务。最近关于单眼深度估计的许多工作探索了越来越大的卷积神经网络以隐式地学习单眼线索。由于大型网络往往会扭曲低分辨率层中的精细细节（例如边缘和角落），因此此类方法可能无法很好地概括对象边界，从而导致对象边缘附近的深度预测不佳。为了减少对象边界附近的深度损失，本文提出明确地解耦对象的身体和边缘的深度特征，分别对应于图像的低频和高频区域。为此，我们学习了一个流场，将深度特征扭曲成一致的身体特征和残余边缘特征。之后，对两组特征都采用解耦监督来明确学习身体和边缘深度图。此外，我们还提出了一个轻量级的编码器-解码器网络，可以在多个尺度上有效地组合特征，以减轻最终特征图中精细细节的丢失。在 NYUD-v2 和 KITTI 数据集上进行的大量实验表明，我们提出的具有深度解耦的轻量级网络的性能与最先进的方法相当，同时大大减少了参数数量。

更新日期：2021-08-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11