Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning,IEEE Transactions on Robotics

当前位置： X-MOL 学术 › IEEE Trans. Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning
IEEE Transactions on Robotics ( IF 7.8 ) Pub Date : 2020-01-01 , DOI: 10.1109/tro.2020.3031214
Jannik Zurn , Wolfram Burgard , Abhinav Valada

Mobile robots operating in unknown urban environments encounter a wide range of complex terrains to which they must adapt their planned trajectory for safe and efficient navigation. Most existing approaches utilize supervised learning to classify terrains from either an exteroceptive or a proprioceptive sensor modality. However, this requires a tremendous amount of manual labeling effort for each newly encountered terrain as well as for variations of terrains caused by changing environmental conditions. In this work, we propose a novel terrain classification framework leveraging an unsupervised proprioceptive classifier that learns from vehicle-terrain interaction sounds to self-supervise an exteroceptive classifier for pixel-wise semantic segmentation of images. To this end, we first learn a discriminative embedding space for vehicle-terrain interaction sounds from triplets of audio clips formed using visual features of the corresponding terrain patches and cluster the resulting embeddings. We subsequently use these clusters to label the visual terrain patches by projecting the traversed tracks of the robot into the camera images. Finally, we use the sparsely labeled images to train our semantic segmentation network in a weakly supervised manner. We present extensive quantitative and qualitative results that demonstrate that our proprioceptive terrain classifier exceeds the state-of-the-art among unsupervised methods and our self-supervised exteroceptive semantic segmentation model achieves a comparable performance to supervised learning with manually labeled data.

中文翻译：

来自无监督声学特征学习的自监督视觉地形分类

在未知的城市环境中运行的移动机器人会遇到各种复杂的地形，它们必须调整其计划轨迹以实现安全高效的导航。大多数现有方法利用监督学习从外部感受器或本体感受器模式中对地形进行分类。然而，这需要对每个新遇到的地形以及由于环境条件变化引起的地形变化进行大量的手动标记工作。在这项工作中，我们提出了一种新的地形分类框架，利用无监督的本体分类器从车辆-地形交互声音中学习，以自我监督用于图像像素语义分割的外部感受分类器。为此，我们首先从使用相应地形块的视觉特征形成的音频剪辑的三元组中学习车辆-地形交互声音的判别嵌入空间，并对结果嵌入进行聚类。我们随后使用这些集群通过将机器人的遍历轨迹投影到相机图像中来标记视觉地形块。最后，我们使用稀疏标记的图像以弱监督的方式训练我们的语义分割网络。我们提供了广泛的定量和定性结果，表明我们的本体感受地形分类器超越了无监督方法中的最新技术，并且我们的自监督外部感受语义分割模型实现了与手动标记数据的监督学习相当的性能。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>