当前位置: X-MOL 学术Comput. Aided Civ. Infrastruct. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A self-supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis
Computer-Aided Civil and Infrastructure Engineering ( IF 8.5 ) Pub Date : 2022-10-28 , DOI: 10.1111/mice.12938
Jie Shen 1 , Wenjie Yan 1 , Shengxian Qin 1 , Xiaoyu Zheng 2
Affiliation  

Estimating the depth of a construction scene from a single red-green-blue image is a crucial prerequisite for various applications, including work zone safety, localization, productivity analysis, activity recognition, and scene understanding. Recently, self-supervised representation learning methods have made significant progress and demonstrated state-of-the-art performance on monocular depth estimation. However, the two leading open challenges are the ambiguity of estimated depth up to an unknown scale and representation transferability for a downstream task, which severely hinders the practical deployment of self-supervised methods. We propose a prior information-based method, not depending on additional sensors, to recover the unknown scale in monocular vision and predict per-pixel absolute depth. Moreover, a new learning paradigm for a self-supervised monocular depth estimation model is constructed to transfer the pre-trained self-supervised model to other downstream construction scene analysis tasks. Meanwhile, we also propose a novel depth loss to enforce depth consistency when transferring to a new downstream task and two new metrics to measure transfer performance. Finally, we verify the effectiveness of scale recovery and representation transferability in isolation. The new learning paradigm with our new metrics and depth loss is expected to estimate the monocular depth of a construction scene without depth ground truth like light detection and ranging. Our models will serve as a good foundation for further construction scene analysis tasks.

中文翻译:

一种用于施工场景分析的具有尺度恢复和迁移学习的自监督单目深度估计模型

从单个红-绿-蓝图像估计施工场景的深度是各种应用的关键先决条件,包括工作区安全、定位、生产力分析、活动识别和场景理解。最近,自监督表示学习方法取得了重大进展,并在单目深度估计方面展示了最先进的性能。然而,两个主要的开放挑战是未知规模的估计深度的模糊性和下游任务的表示可转移性,这严重阻碍了自监督方法的实际部署。我们提出了一种基于先验信息的方法,不依赖于额外的传感器,以恢复单目视觉中的未知尺度并预测每个像素的绝对深度。而且,构建了一种新的自监督单目深度估计模型学习范式,以将预训练的自监督模型转移到其他下游施工场景分析任务。同时,我们还提出了一种新的深度损失以在转移到新的下游任务时加强深度一致性,并提出了两个新的指标来衡量转移性能。最后,我们单独验证了尺度恢复和表征可迁移性的有效性。具有我们的新指标和深度损失的新学习范例有望估计建筑场景的单眼深度,而没有像光检测和测距这样的深度地面实况。我们的模型将为进一步的施工场景分析任务奠定良好的基础。
更新日期:2022-10-28
down
wechat
bug