DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-10-10 , DOI: 10.1109/tpami.2021.3118833
Ivan Shugurov ₁ , Sergey Zakharov ₁ , Slobodan Ilic ₁

Affiliation

We propose a three-stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector) that relies on dense correspondences. We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose. Unlike other deep learning methods that are typically restricted to monocular RGB images, we propose a unified deep learning network allowing different imaging modalities to be used (RGB or Depth). Moreover, we propose a novel pose refinement method, that is based on differentiable rendering. The main concept is to compare predicted and rendered correspondences in multiple views to obtain a pose which is consistent with predicted correspondences in all views. Our proposed method is evaluated rigorously on different data modalities and types of training data in a controlled setup. The main conclusions is that RGB excels in correspondence estimation, while depth contributes to the pose accuracy if good 3D-3D correspondences are available. Naturally, their combination achieves the overall best performance. We perform an extensive evaluation and an ablation study to analyze and validate the results on several challenging datasets. DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data.

中文翻译：

DPODv2：基于密集对应的 6 DoF 姿态估计

我们提出了一种称为 DPODv2（密集姿势对象检测器）的三阶段 6 DoF 对象检测方法，该方法依赖于密集对应关系。我们将 2D 目标检测器与密集对应估计网络和多视图姿态细化方法相结合来估计完整的 6 DoF 姿态。与其他通常仅限于单目 RGB 图像的深度学习方法不同，我们提出了一个统一的深度学习网络，允许使用不同的成像模式（RGB 或深度）。此外，我们提出了一种基于可微渲染的新颖的姿态细化方法。主要概念是比较多个视图中的预测对应关系和渲染对应关系，以获得与所有视图中的预测对应关系一致的姿势。我们提出的方法在受控设置中针对不同的数据模式和训练数据类型进行了严格评估。主要结论是 RGB 在对应估计方面表现出色，而如果存在良好的 3D-3D 对应，则深度有助于姿势精度。自然地，他们的组合达到了整体最佳性能。我们进行了广泛的评估和消融研究，以分析和验证几个具有挑战性的数据集的结果。 DPODv2 在所有这些方面都取得了优异的结果，同时仍然保持快速和可扩展性，独立于所使用的数据模态和训练数据类型。

更新日期：2021-10-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11