PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning
Pattern Recognition ( IF 8 ) Pub Date : 2020-06-01 , DOI: 10.1016/j.patcog.2019.107187
Guangyao Zhai , Liang Liu , Linjian Zhang , Yong Liu , Yunliang Jiang

While many visual ego-motion algorithm variants have been proposed in the past decade, learning based ego-motion estimation methods have seen an increasing attention because of its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of fully trainable visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to map directly from input image pairs to an estimate of ego-motion (parameterized as 6-DoF transformation matrices). We introduce a novel two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU, with an explicit sequence pose estimation loss to achieve this. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6 channels tensor for module-1 to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through a stacked ConvGRU module to generate the relative transformation pose of each image pair. We also augment the training data by randomly skipping frames to simulate the velocity variation which results in a better performance in turning and high-velocity situations. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark. The experiments show a competitive performance of the proposed method to the geometric method and encourage further exploration of learning based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results.

中文翻译：

PoseConvGRU：一种通过学习进行视觉自我运动估计的单目方法

尽管在过去十年中提出了许多视觉自我运动算法变体，但基于学习的自我运动估计方法因其对图像噪声的鲁棒性和相机校准独立性的理想特性而受到越来越多的关注。在这项工作中，我们提出了一种数据驱动的方法，用于单目相机的完全可训练的视觉自我运动估计。我们使用端到端的学习方法允许模型直接从输入图像对映射到自我运动的估计（参数化为 6-DoF 变换矩阵）。我们引入了一种称为 PoseConvGRU 的新型双模块长期循环卷积神经网络，它具有明确的序列姿态估计损失来实现这一点。特征编码模块对图像对中的短期运动特征进行编码，而记忆传播模块捕获连续图像对中的长期运动特征。视觉记忆是用卷积门控循环单元实现的，它允许随时间传播信息。在每个时间步，两个连续的 RGB 图像堆叠在一起形成一个 6 通道张量，用于模块 1 学习如何提取运动信息和估计姿势。然后将输出图序列通过堆叠的 ConvGRU 模块以生成每个图像对的相对变换位姿。我们还通过随机跳帧来模拟速度变化来增加训练数据，从而在转弯和高速情况下获得更好的性能。我们在 KITTI 视觉里程计基准测试中评估了我们提出的方法的性能。

更新日期：2020-06-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>