Incorporating learnt local and global embeddings into monocular visual SLAM,Autonomous Robots

当前位置： X-MOL 学术 › Auton. Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Incorporating learnt local and global embeddings into monocular visual SLAM
Autonomous Robots ( IF 3.5 ) Pub Date : 2021-08-05 , DOI: 10.1007/s10514-021-10007-8
Huaiyang Huang ₁ , Lujia Wang ₁ , Ming Liu ₁ , Haoyang Ye ₂ , Yuxiang Sun ₃

Affiliation

Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on low-level vision information for state estimation, such as handcrafted local features or the image gradient. While significant progress has been made through this track, under more challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that fully exploits learnt features for better state estimation. The proposed system leverages both learnt local features and global embeddings at different modules of the system: direct camera pose estimation, inter-frame feature association, and loop closure detection. With a probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a direct manner and parameterize local features with uncertainty taken into account. To alleviate the quantization effect, we adapt the mapping module to generate 3D landmarks better to guarantee the system’s robustness. Detecting temporal loop closure via deep global embeddings further improves the robustness and accuracy of the proposed system. The proposed system is extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the state-of-the-art methods. The competitive performance of camera pose estimation confirms the effectiveness of our method.

中文翻译：

将学习到的局部和全局嵌入合并到单目视觉 SLAM 中

视觉同时定位和映射 (VSLAM) 的传统方法依赖于用于状态估计的低级视觉信息，例如手工制作的局部特征或图像梯度。虽然在这条轨道上取得了重大进展，但在单目 VSLAM 更具挑战性的配置下，例如，不同的照明，最先进系统的性能通常会下降。因此，单目 VSLAM 的鲁棒性和准确性仍然受到广泛关注。本文提出了一个单目 VSLAM 系统，它充分利用学习到的特征来进行更好的状态估计。所提出的系统利用系统不同模块中学习到的局部特征和全局嵌入：直接相机姿态估计、帧间特征关联和闭环检测。通过对关键点预测的概率解释，我们以直接的方式制定相机姿态跟踪，并在考虑不确定性的情况下参数化局部特征。为了减轻量化效应，我们调整映射模块以更好地生成 3D 地标，以保证系统的鲁棒性。通过深度全局嵌入检测时间闭环进一步提高了所提出系统的鲁棒性和准确性。提议的系统在公共数据集（Tsukuba、EuRoC 和 KITTI）上进行了广泛评估，并与最先进的方法进行了比较。相机姿态估计的竞争性能证实了我们方法的有效性。为了减轻量化效应，我们调整映射模块以更好地生成 3D 地标，以保证系统的鲁棒性。通过深度全局嵌入检测时间闭环进一步提高了所提出系统的鲁棒性和准确性。提议的系统在公共数据集（Tsukuba、EuRoC 和 KITTI）上进行了广泛评估，并与最先进的方法进行了比较。相机姿态估计的竞争性能证实了我们方法的有效性。为了减轻量化效应，我们调整映射模块以更好地生成 3D 地标，以保证系统的鲁棒性。通过深度全局嵌入检测时间闭环进一步提高了所提出系统的鲁棒性和准确性。提议的系统在公共数据集（Tsukuba、EuRoC 和 KITTI）上进行了广泛评估，并与最先进的方法进行了比较。相机姿态估计的竞争性能证实了我们方法的有效性。并与最先进的方法进行比较。相机姿态估计的竞争性能证实了我们方法的有效性。并与最先进的方法进行比较。相机姿态估计的竞争性能证实了我们方法的有效性。

更新日期：2021-08-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>