Cross-View Semantic Segmentation for Sensing Surroundings,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-View Semantic Segmentation for Sensing Surroundings
IEEE Robotics and Automation Letters ( IF 4.6 ) Pub Date : 2020-06-23 , DOI: 10.1109/lra.2020.3004325
Bowen Pan , Jiankai Sun , Ho Yin Tiga Leung , Alex Andonian , Bolei Zhou

Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.

中文翻译：

用于感知环境的跨视图语义分割

感知周围环境在人类空间感知中起着至关重要的作用，因为它从观察中提取物体的空间配置以及自由空间。为了促进机器人具有这种周围感知能力的感知，我们引入了一种称为跨视图语义分割的新颖视觉任务以及一个名为视图解析网络（VPN）的框架来解决它。在跨视图语义分割任务中，代理被训练将第一视图观察结果解析为自顶向下视图语义图，指示像素级所有对象的空间位置。这项任务的主要问题是我们缺乏自上而下视图数据的真实注释。为了缓解这个问题，我们在 3D 图形环境中训练 VPN，并利用域适应技术将其传输以处理现实世界的数据。我们在合成代理和真实代理上评估我们的 VPN。实验结果表明，我们的模型可以有效地利用不同视角和多模态的信息来理解空间信息。我们在 LoCoBot 机器人上进行的进一步实验表明，我们的模型能够通过 2D 图像输入实现周围环境的传感功能。代码和演示视频可以在 https://view-parsing-network.github.io 找到。

更新日期：2020-06-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文