Unsupervised monocular visual odometry with decoupled camera pose estimation,Digital Signal Processing

当前位置： X-MOL 学术 › Digit. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised monocular visual odometry with decoupled camera pose estimation
Digital Signal Processing ( IF 2.9 ) Pub Date : 2021-04-09 , DOI: 10.1016/j.dsp.2021.103052
Lili Lin , Weisheng Wang , Wan Luo , Lesheng Song , Wenhui Zhou

Drift or error accumulation is an inevitable challenge in visual odometry (VO). To alleviate this issue, most of learning-based VO methods focus on various long and short term sequential learning schemes, while lose sight of the fact that inaccurate rotation estimate is the main source of VO drift. They usually estimate the six degrees of freedom (DoFs) of the camera motion simultaneously, without considering the inherent rotation-translation ambiguity. In this paper, we start from the designs of a cascade decoupled structure and a residual-based decoupled pose refinement scheme for accurate pose estimation. Then we extend them to an unsupervised monocular VO framework, which estimates the 3D camera poses by decoupling the estimations of rotation, translation and scale. Our VO model consists of three components: a monocular depth estimation, a decoupled pose estimation and a decoupled pose refinement. The first component learns the metric scale and depth cues by using stereo pairs for training, and predicts the absolute depth of monocular inputs. The latter two separate the estimation and refinement of rotation and translation. To improve the robustness of the rotation estimation, we use the unit quaternion, instead of the Euler angles, to represent 3D rotation. We have evaluated our model on the KITTI Visual Odometry Evaluation benchmark. Comparison experiments demonstrate that our method is superior to the state-of-the-art unsupervised VO methods, and can achieve comparable results with the supervised ones.

中文翻译：

无监督的单眼视觉测距法，具有解耦的相机姿态估计

漂移或误差累积是视觉里程表（VO）不可避免的挑战。为了缓解这个问题，大多数基于学习的VO方法专注于各种长期和短期顺序学习方案，而忽视了不正确的旋转估计是VO漂移的主要来源这一事实。他们通常会同时估计摄像机运动的六个自由度（DoF），而不考虑固有的旋转和平移模糊性。在本文中，我们从级联解耦结构和基于残差的解耦姿态细化方案的设计开始，以进行精确的姿态估计。然后，我们将它们扩展到无监督的单眼VO框架，该框架通过将旋转，平移和缩放的估计值解耦来估计3D相机的姿势。我们的VO模型由三部分组成：单眼深度估算，解耦的姿势估计和解耦的姿势细化。第一部分通过使用立体声对进行训练来学习度量尺度和深度提示，并预测单眼输入的绝对深度。后两者将旋转和平移的估计和细化分开。为了提高旋转估计的鲁棒性，我们使用单位四元数而不是欧拉角来表示3D旋转。我们已经在KITTI视觉里程表评估基准上评估了我们的模型。对比实验表明，我们的方法优于最新的无监督VO方法，并且可以获得与监督方法相当的结果。并预测单眼输入的绝对深度。后两者将旋转和平移的估计和细化分开。为了提高旋转估计的鲁棒性，我们使用单位四元数而不是欧拉角来表示3D旋转。我们已经在KITTI视觉里程表评估基准上评估了我们的模型。对比实验表明，我们的方法优于最新的无监督VO方法，并且可以获得与监督方法相当的结果。并预测单眼输入的绝对深度。后两者将旋转和平移的估计和细化分开。为了提高旋转估计的鲁棒性，我们使用单位四元数而不是欧拉角来表示3D旋转。我们已经在KITTI视觉里程表评估基准上评估了我们的模型。对比实验表明，我们的方法优于最新的无监督VO方法，并且可以获得与监督方法相当的结果。

更新日期：2021-04-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11