Abstract
Pose estimation based on RGB-D images is a hot issue that has attracted much attention in recent years. A key technical challenge is to extract features from depth information and image information separately and fully leverage the two complementary data sources. The previous methods ignored the internal connection of local features and the feature fusion of heterogeneous data, limiting the robustness and real-time performance in cluttered scenes. In this article, we propose LHFF-Net, a generic framework based on dynamic graph convolution to strengthen the information aggregation among all point clouds in a local region. After extracting heterogeneous features, we fuse information from two data sources in different receptive fields, to estimate the pose of the object while fully extracting local features. We show in experiments that the proposed approach outperforms state-of-the-art approaches on two challenging data sets, YCB-Video and LineMOD. We also have deployed our proposed method on the UR5 robot for grasping experiments and achieved good grasping performance.
Similar content being viewed by others
References
Hinterstoisser S, et al. (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: International Conference on Computer Vision, Barcelona, 2011, pp 858–865
Hu Y, Fua P, Wang W, Salzmann M (2020) Single-Stage 6D Object Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 2927–2936
Qi Charles R, et al. (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, Springer, pp 536–551
Hinterstoisser S , Lepetit V , Rajkumar N, Konolige K (2016) Going further with point pair features. In: European Conference on Computer Vision, Springer, pp 834–848
Gao Xiao-Shan, Hou Xiao-Rong, Tang Jianliang, Cheng Hang-Fei (2003) Complete solution classifification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943
Szegedy C et al. (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp 1–9
Engelcke M, Rao D, Wang DZ, Tong CH, Posner I (2017) Vote3deep: Fast object detection in 3d point clouds using effificient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, pp 1355–1361
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision 1521–1529
Li Z, Wang G, Ji X (2019) Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October
Hinterstoisser Stefan , Lepetit Vincent , Ilic Slobodan , Holzer Stefan , Bradski Gary , Konolige Kurt , Navab Nassir (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pages 548-562. Springer
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Effificient and robust 3d object recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, Ieee, pp 998–1005
Wang C, et al. (2019) DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3338–3347
Buch AG, Kiforenko L, Kraft D (2017) Rotational subgroup voting and pose clustering for robust 3d object recognition. In: Computer Vision (ICCV), 2017 IEEE International Conference on, IEEE, pp 4137–4145
Pajarre E, Ritoniemi T, Tenhunen H (1991) G2L: system for converting low-level geometrical designs to a higher level representation, Euro ASIC ’91. France, Paris, pp 366–371
Capellen CMS, Behnke S (2020) ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation. In: 15th International Conference on Computer Vision Theory and Applications
Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation. In: European Conference on Computer Vision, Springer, pp 205–220
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfifield S (2018) Deep object pose estimation for semantic robotic grasping of household objects, arXiv preprint arXiv:1809.10790
Wang Yue , Sun Yongbin , Liu Ziwei (2018) Dynamic Graph CNN for Learning on Point Clouds. ACM Transactions on Graphics 38.5
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-Driven 6D Object Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3380–3389
Qi CR, Su H, Mo K, et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85, Honolulu, HI, USA, 21-26 July
Marchand E, Uchiyama H, Spindler F (2016) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651
Zhang K , Hao M , Wang J, et al. (2019) Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features,
Wang C, et al. (2020) 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, pp 10059–10066
Rios-Cabrera R, Tuytelaars T (2013) Discriminatively trained templates for 3d object detection: A real time scalable approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2048–2055
Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3d object modeling and recognition using local affifine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66(3):231–259
Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas LJ (2019) Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 2637–2646
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 4556–4565
Qi Charles R, et al. (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Lepetit Vincent, Moreno-Noguer Francesc, Fua Pascal (2009) Epnp: An accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1907-1915
Tejani A, Tang D, Kouskouridas R, Kim T-K (2014) Latent-class hough forests for 3d object detection and pose estimation. In: Proceedings of the European Conference on Computer Vision, Springer, pp 462–477
Vidal Joel CY Lin, Marti R (2018) 6D pose estimation using an improved method based on point pair features. In: 2018 4th International Conference on Control, Automation and Robotics (ICCAR) IEEE
Xu D, Anguelov D, Jain A (2018) PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp 244–253
Wohlhart P, Lepetit V (2015) Learning descriptors for object recognition and 3d pose estimation. In: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 3109–3118
Shin Younghak, Balasingham Ilangko (2017) Comparison of hand-craft feature based svm and cnn based deep learning framework for automatic polyp classifification. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp 3277–3280
Zakharov Sergey, Shugurov Ivan, Ilic Slobodan (2019) Dpod: 6d pose object detector and refifiner. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1941–1950
Tekin Bugra, Sinha Sudipta N, Fua Pascal (2018) Real-time seamless single shot 6d object pose prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Wang Peng-Shuai, et al. (2017) O-cnn: Octree-based convolutional neural networks for 3d shape analysis. In: ACM Transactions on Graphics (TOG), 36(4):72
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. ArXiv preprint arXiv:1711.00199
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 683–698
Chen W, Duan J, Basevi H, Chang HJ , Leonardis A (2020) Ponitposenet: Point pose network for robust 6d object pose estimation. In: The IEEE Winter Conference on Applications of Computer Vision, pp 2824–2833
Zhu M., Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3d object detection and pose estimation for grasping. In: Robotics and Automation (ICRA), 2014 IEEE International Conference on, IEEE, pp 3936–3943
Yang B, Luo W, Urtasun R (2018) Pixor: Realtime 3d object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 11629–11638
Zhou Y , Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499
Li Yangyan, Bu Rui, Sun Mingchao, Wu Wei, et al. (2018)PointCNN: Convolution On \(X\)-Transformed Points. arXiv:1801.07791, Nov
Park Kiru T Patten, Vincze M (2020) Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) IEEE
Schwarz M, Schulz H, Behnke S (2015) RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: IEEE International Conference on Robotics & Automation. IEEE: 1329–1335
Tulsiani S, Malik J (2015) Viewpoints and keypoints. In: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 1510–1519
Li C, Bai J, Hager G D (2018) A Unified Framework for Multi-View Multi-Class Object Pose Estimation
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N182612002 and N2026002, the central government guides the local science and technology development special fund 2021JH6/10500129 to study the theory and method of robot dexterity and precision operation for 3C assembly.
Rights and permissions
About this article
Cite this article
Wang, F., He, Z., Zhang, X. et al. LHFF-Net: Local heterogeneous feature fusion network for 6DoF pose estimation. Int. J. Mach. Learn. & Cyber. 12, 2795–2807 (2021). https://doi.org/10.1007/s13042-021-01364-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01364-y