当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A lightweight convolutional neural network for pose estimation of a planar model
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2022-03-31 , DOI: 10.1007/s00138-022-01292-z
Vladimir Ocegueda-Hernández 1 , Israel Román-Godínez 2 , Gerardo Mendizabal-Ruiz 3
Affiliation  

The 3D pose estimation problem consists of calculating the position and orientation of a three-dimensional object from its projection onto a two-dimensional image relative to a given reference frame. In recent years, convolutional neural networks (CNNs) have achieved impressive results in addressing some of the traditional problems of computer vision, including 3D pose estimation. In general, CNNs employed contain convolutional and fully connected layers with many neurons and trainable parameters. That is, they are heavyweight architectures. Such models are difficult to train, highly memory-consuming, and, as the number of trainable parameters increases, they tend to suffer from overfitting. In this work, we present a lightweight CNN called Pose Network with Spatial Pyramid Pooling (PNSPP), capable of estimating the six-degree-of-freedom pose of a planar model from a single RGB image. Inspired by PoseNet, our CNN employs almost the same architecture but contains 4X fewer parameters (for a chosen image size) thanks to its optimized regression layers. In all tests, PNSPP outperformed PoseNet in the pose predictions. The overall relative improvements were in the ranges of 24–40%, and 9–33% for the estimated position and orientation errors, respectively. Other performance metrics, such as RMSE and ADD, also favored PNSPP. Finally, we propose a method that estimates the scale factor \(\beta \) used in the pose error functions to balance the contributions of the position and orientation terms. Unlike other approaches that perform potentially expensive grid or random searches, our method uses simple heuristics to adjust this value as the neural network training progress. At the end of each experiment, the estimated \(\beta \) values deviated roughly ± 10% from the optimal values, which in our case seems reasonable given the computational cost of performing more exhaustive searches.



中文翻译:

一种用于平面模型姿态估计的轻量级卷积神经网络

3D 姿态估计问题包括计算三维对象的位置和方向,从其投影到相对于给定参考帧的二维图像。近年来,卷积神经网络 (CNN) 在解决计算机视觉的一些传统问题(包括 3D 姿态估计)方面取得了令人瞩目的成果。一般来说,使用的 CNN 包含卷积层和全连接层,具有许多神经元和可训练的参数。也就是说,它们是重量级架构。这样的模型难以训练,消耗大量内存,并且随着可训练参数数量的增加,它们往往会出现过度拟合。在这项工作中,我们提出了一个名为 Pose Network with Spatial Pyramid Pooling (PNSPP) 的轻量级 CNN,能够从单个 RGB 图像估计平面模型的六自由度位姿。受 PoseNet 的启发,我们的 CNN 采用了几乎相同的架构,但由于其优化的回归层,包含的参数(对于选定的图像大小)减少了 4 倍。在所有测试中,PNSPP 在姿势预测方面都优于 PoseNet。对于估计的位置和方向误差,总体相对改进分别在 24-40% 和 9-33% 的范围内。其他性能指标,如 RMSE 和 ADD,也有利于 PNSPP。最后,我们提出了一种估计比例因子的方法 PNSPP 在姿势预测方面优于 PoseNet。对于估计的位置和方向误差,总体相对改进分别在 24-40% 和 9-33% 的范围内。其他性能指标,如 RMSE 和 ADD,也有利于 PNSPP。最后,我们提出了一种估计比例因子的方法 PNSPP 在姿势预测方面优于 PoseNet。对于估计的位置和方向误差,总体相对改进分别在 24-40% 和 9-33% 的范围内。其他性能指标,如 RMSE 和 ADD,也有利于 PNSPP。最后,我们提出了一种估计比例因子的方法\(\beta \)在位姿误差函数中用于平衡位置和方向项的贡献。与执行潜在昂贵的网格或随机搜索的其他方法不同,我们的方法使用简单的启发式方法来随着神经网络训练的进展调整该值。在每个实验结束时,估计的\(\beta \)值大约偏离最优值 ± 10%,考虑到执行更详尽搜索的计算成本,这在我们的案例中似乎是合理的。

更新日期:2022-03-31
down
wechat
bug