RGB-Fusion: Monocular 3D reconstruction with learned depth prediction,Displays

当前位置： X-MOL 学术 › Displays › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RGB-Fusion: Monocular 3D reconstruction with learned depth prediction
Displays ( IF 3.7 ) Pub Date : 2021-09-28 , DOI: 10.1016/j.displa.2021.102100
ZhiMin Duan ₁ , YingWen Chen ₁ , HuJie Yu ₁ , BoWen Hu ₁ , Chen Chen ₁

Affiliation

Generating large-scale and high-quality 3D scene reconstruction from monocular images is an essential technical foundation in augmented reality and robotics. However, the apparent shortcomings (e.g., scale ambiguity, dense depth estimation in texture-less areas) make applying monocular 3D reconstruction to real-world practice challenging. In this work, we combine the advantage of deep learning and multi-view geometry to propose RGB-Fusion, which effectively solves the inherent limitations of traditional monocular reconstruction. To eliminate the confinements of tracking accuracy imposed by the prediction deficiency of neural networks, we propose integrating the PnP (Perspective-n-Point) algorithm into the tracking module. We employ 3D ICP (Iterative Closest Point) matching and 2D feature matching to construct separate error terms and jointly optimize them, reducing the dependence on the accuracy of depth prediction and improving pose estimation accuracy. The approximate pose predicted by the neural network is employed as the initial optimization value to avoid the trapping of local minimums. We formulate a depth map refinement strategy based on the uncertainty of the depth value, which can naturally lead to a refined depth map. Through our method, low-uncertainty elements can significantly update the current depth value while avoiding high-uncertainty elements from adversely affecting depth estimation accuracy. Numerical qualitative and quantitative evaluation results of tracking, depth prediction, and 3D reconstruction show that RGB-Fusion exceeds most monocular 3D reconstruction systems.

中文翻译：

RGB-Fusion：具有学习深度预测的单目 3D 重建

从单目图像生成大规模、高质量的 3D 场景重建是增强现实和机器人技术的重要技术基础。然而，明显的缺点（例如，无纹理区域中的尺度模糊、密集深度估计）使得将单目 3D 重建应用于现实世界的实践具有挑战性。在这项工作中，我们结合深度学习和多视图几何的优势提出了 RGB-Fusion，有效解决了传统单目重建的固有局限性。为了消除神经网络预测缺陷对跟踪精度的限制，我们建议将 PnP（Perspective-n-Point）算法集成到跟踪模块中。我们采用3D ICP（迭代最近点）匹配和2D特征匹配来构建单独的误差项并联合优化它们，减少对深度预测精度的依赖，提高姿态估计精度。神经网络预测的近似位姿被用作初始优化值，以避免捕获局部最小值。我们根据深度值的不确定性制定了深度图细化策略，这自然可以得到细化的深度图。通过我们的方法，低不确定性元素可以显着更新当前深度值，同时避免高不确定性元素对深度估计精度产生不利影响。跟踪、深度预测、

更新日期：2021-10-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11