Towards Better Generalization: Joint Depth-Pose Learning without PoseNet,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
arXiv - CS - Robotics Pub Date : 2020-04-03 , DOI: arxiv-2004.01314
Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu

In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.

中文翻译：

走向更好的泛化：没有 PoseNet 的联合深度姿势学习

在这项工作中，我们解决了自监督联合深度姿势学习的尺度不一致的基本问题。大多数现有方法假设可以在所有输入样本中学习一致的深度和姿态尺度，这使得学习问题更加困难，导致性能下降，在室内环境和长序列视觉里程计应用中的泛化能力有限。为了解决这个问题，我们提出了一个新的系统，它明确地将规模与网络估计分开。我们的方法不依赖于 PoseNet 架构，而是通过从密集光流对应关系中直接求解基本矩阵来恢复相对姿态，并利用两视图三角测量模块来恢复最高比例的 3D 结构。然后，我们将深度预测的尺度与三角点云对齐，并使用转换后的深度图进行深度误差计算和密集重投影检查。我们的整个系统可以进行端到端的联合训练。大量实验表明，我们的系统不仅在 KITTI 深度和流量估计上达到了最先进的性能，而且在各种具有挑战性的场景下显着提高了现有自监督深度姿势学习方法的泛化能力，并实现了在 KITTI Odometry 和 NYUv2 数据集上基于自我监督学习的方法中的最新结果。此外，我们就基于 PoseNet 的相对姿态估计方法在泛化能力方面的局限性提出了一些有趣的发现。代码可在 https://github.com/B1ueber2y/TrianFlow 获得。大量实验表明，我们的系统不仅在 KITTI 深度和流量估计上达到了最先进的性能，而且在各种具有挑战性的场景下显着提高了现有自监督深度姿势学习方法的泛化能力，并实现了在 KITTI Odometry 和 NYUv2 数据集上基于自我监督学习的方法中的最新结果。此外，我们就基于 PoseNet 的相对姿态估计方法在泛化能力方面的局限性提出了一些有趣的发现。代码可在 https://github.com/B1ueber2y/TrianFlow 获得。大量实验表明，我们的系统不仅在 KITTI 深度和流量估计上达到了最先进的性能，而且在各种具有挑战性的场景下显着提高了现有自监督深度姿势学习方法的泛化能力，并实现了在 KITTI Odometry 和 NYUv2 数据集上基于自我监督学习的方法中的最新结果。此外，我们就基于 PoseNet 的相对姿态估计方法在泛化能力方面的局限性提出了一些有趣的发现。代码可在 https://github.com/B1ueber2y/TrianFlow 获得。但也显着提高了现有自监督深度姿势学习方法在各种具有挑战性的场景下的泛化能力，并在 KITTI Odometry 和 NYUv2 数据集上的基于自监督学习的方法中取得了最先进的结果。此外，我们就基于 PoseNet 的相对姿态估计方法在泛化能力方面的局限性提出了一些有趣的发现。代码可在 https://github.com/B1ueber2y/TrianFlow 获得。但也显着提高了现有自监督深度姿势学习方法在各种具有挑战性的场景下的泛化能力，并在 KITTI Odometry 和 NYUv2 数据集上的基于自监督学习的方法中取得了最先进的结果。此外，我们就基于 PoseNet 的相对姿态估计方法在泛化能力方面的局限性提出了一些有趣的发现。代码可在 https://github.com/B1ueber2y/TrianFlow 获得。

更新日期：2020-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文