Skip to main content
Log in

LHFF-Net: Local heterogeneous feature fusion network for 6DoF pose estimation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Pose estimation based on RGB-D images is a hot issue that has attracted much attention in recent years. A key technical challenge is to extract features from depth information and image information separately and fully leverage the two complementary data sources. The previous methods ignored the internal connection of local features and the feature fusion of heterogeneous data, limiting the robustness and real-time performance in cluttered scenes. In this article, we propose LHFF-Net, a generic framework based on dynamic graph convolution to strengthen the information aggregation among all point clouds in a local region. After extracting heterogeneous features, we fuse information from two data sources in different receptive fields, to estimate the pose of the object while fully extracting local features. We show in experiments that the proposed approach outperforms state-of-the-art approaches on two challenging data sets, YCB-Video and LineMOD. We also have deployed our proposed method on the UR5 robot for grasping experiments and achieved good grasping performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hinterstoisser S, et al. (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: International Conference on Computer Vision, Barcelona, 2011, pp 858–865

  2. Hu Y, Fua P, Wang W, Salzmann M (2020) Single-Stage 6D Object Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 2927–2936

  3. Qi Charles R, et al. (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  4. Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, Springer, pp 536–551

  5. Hinterstoisser S , Lepetit V , Rajkumar N, Konolige K (2016) Going further with point pair features. In: European Conference on Computer Vision, Springer, pp 834–848

  6. Gao Xiao-Shan, Hou Xiao-Rong, Tang Jianliang, Cheng Hang-Fei (2003) Complete solution classifification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943

    Article  Google Scholar 

  7. Szegedy C et al. (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp 1–9

  8. Engelcke M, Rao D, Wang DZ, Tong CH, Posner I (2017) Vote3deep: Fast object detection in 3d point clouds using effificient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, pp 1355–1361

  9. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision 1521–1529

  10. Li Z, Wang G, Ji X (2019) Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October

  11. Hinterstoisser Stefan , Lepetit Vincent , Ilic Slobodan , Holzer Stefan , Bradski Gary , Konolige Kurt , Navab Nassir (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pages 548-562. Springer

  12. Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Effificient and robust 3d object recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, Ieee, pp 998–1005

  13. Wang C, et al. (2019) DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3338–3347

  14. Buch AG, Kiforenko L, Kraft D (2017) Rotational subgroup voting and pose clustering for robust 3d object recognition. In: Computer Vision (ICCV), 2017 IEEE International Conference on, IEEE, pp 4137–4145

  15. Pajarre E, Ritoniemi T, Tenhunen H (1991) G2L: system for converting low-level geometrical designs to a higher level representation, Euro ASIC ’91. France, Paris, pp 366–371

  16. Capellen CMS, Behnke S (2020) ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation. In: 15th International Conference on Computer Vision Theory and Applications

  17. Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation. In: European Conference on Computer Vision, Springer, pp 205–220

  18. Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfifield S (2018) Deep object pose estimation for semantic robotic grasping of household objects, arXiv preprint arXiv:1809.10790

  19. Wang Yue , Sun Yongbin , Liu Ziwei (2018) Dynamic Graph CNN for Learning on Point Clouds. ACM Transactions on Graphics 38.5

  20. Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-Driven 6D Object Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3380–3389

  21. Qi CR, Su H, Mo K, et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85, Honolulu, HI, USA, 21-26 July

  22. Marchand E, Uchiyama H, Spindler F (2016) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651

    Article  Google Scholar 

  23. Zhang K , Hao M , Wang J, et al. (2019) Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features,

  24. Wang C, et al. (2020) 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, pp 10059–10066

  25. Rios-Cabrera R, Tuytelaars T (2013) Discriminatively trained templates for 3d object detection: A real time scalable approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2048–2055

  26. Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3d object modeling and recognition using local affifine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66(3):231–259

    Article  Google Scholar 

  27. Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas LJ (2019) Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 2637–2646

  28. Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 4556–4565

  29. Qi Charles R, et al. (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  30. Lepetit Vincent, Moreno-Noguer Francesc, Fua Pascal (2009) Epnp: An accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155

    Article  Google Scholar 

  31. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1907-1915

  32. Tejani A, Tang D, Kouskouridas R, Kim T-K (2014) Latent-class hough forests for 3d object detection and pose estimation. In: Proceedings of the European Conference on Computer Vision, Springer, pp 462–477

  33. Vidal Joel CY Lin, Marti R (2018) 6D pose estimation using an improved method based on point pair features. In: 2018 4th International Conference on Control, Automation and Robotics (ICCAR) IEEE

  34. Xu D, Anguelov D, Jain A (2018) PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp 244–253

  35. Wohlhart P, Lepetit V (2015) Learning descriptors for object recognition and 3d pose estimation. In: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 3109–3118

  36. Shin Younghak, Balasingham Ilangko (2017) Comparison of hand-craft feature based svm and cnn based deep learning framework for automatic polyp classifification. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp 3277–3280

  37. Zakharov Sergey, Shugurov Ivan, Ilic Slobodan (2019) Dpod: 6d pose object detector and refifiner. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1941–1950

  38. Tekin Bugra, Sinha Sudipta N, Fua Pascal (2018) Real-time seamless single shot 6d object pose prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

  39. Wang Peng-Shuai, et al. (2017) O-cnn: Octree-based convolutional neural networks for 3d shape analysis. In: ACM Transactions on Graphics (TOG), 36(4):72

  40. Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. ArXiv preprint arXiv:1711.00199

  41. Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 683–698

  42. Chen W, Duan J, Basevi H, Chang HJ , Leonardis A (2020) Ponitposenet: Point pose network for robust 6d object pose estimation. In: The IEEE Winter Conference on Applications of Computer Vision, pp 2824–2833

  43. Zhu M., Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3d object detection and pose estimation for grasping. In: Robotics and Automation (ICRA), 2014 IEEE International Conference on, IEEE, pp 3936–3943

  44. Yang B, Luo W, Urtasun R (2018) Pixor: Realtime 3d object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660

  45. He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 11629–11638

  46. Zhou Y , Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499

  47. Li Yangyan, Bu Rui, Sun Mingchao, Wu Wei, et al. (2018)PointCNN: Convolution On \(X\)-Transformed Points. arXiv:1801.07791, Nov

  48. Park Kiru T Patten, Vincze M (2020) Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) IEEE

  49. Schwarz M, Schulz H, Behnke S (2015) RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: IEEE International Conference on Robotics & Automation. IEEE: 1329–1335

  50. Tulsiani S, Malik J (2015) Viewpoints and keypoints. In: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 1510–1519

  51. Li C, Bai J, Hager G D (2018) A Unified Framework for Multi-View Multi-Class Object Pose Estimation

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N182612002 and N2026002, the central government guides the local science and technology development special fund 2021JH6/10500129 to study the theory and method of robot dexterity and precision operation for 3C assembly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., He, Z., Zhang, X. et al. LHFF-Net: Local heterogeneous feature fusion network for 6DoF pose estimation. Int. J. Mach. Learn. & Cyber. 12, 2795–2807 (2021). https://doi.org/10.1007/s13042-021-01364-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01364-y

Keywords

Navigation