Geometric Affordance Perception: Leveraging Deep 3D Saliency With the Interaction Tensor.,Frontiers in Neurorobotics

当前位置： X-MOL 学术 › Front. Neurorobotics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Geometric Affordance Perception: Leveraging Deep 3D Saliency With the Interaction Tensor.
Frontiers in Neurorobotics ( IF 2.6 ) Pub Date : 2020-07-07 , DOI: 10.3389/fnbot.2020.00045
Eduardo Ruiz ₁ , Walterio Mayol-Cuevas ₁

Affiliation

Agents that need to act on their surroundings can significantly benefit from the perception of their interaction possibilities or affordances. In this paper we combine the benefits of the Interaction Tensor, a straight-forward geometrical representation that captures multiple object-scene interactions, with deep learning saliency for fast parsing of affordances in the environment. Our approach works with visually perceived 3D pointclouds and enables to query a 3D scene for locations that support affordances such as sitting or riding, as well as interactions for everyday objects like the where to hang an umbrella or place a mug. Crucially, the nature of the interaction description exhibits one-shot generalization. Experiments with numerous synthetic and real RGB-D scenes and validated by human subjects, show that the representation enables the prediction of affordance candidate locations in novel environments from a single training example. The approach also allows for a highly parallelizable, multiple-affordance representation, and works at fast rates. The combination of the deep neural network that learns to estimate scene saliency with the one-shot geometric representation aligns well with the expectation that computational models for affordance estimation should be perceptually direct and economical.

中文翻译：

几何负担感知：利用交互张量利用深度3D显着性。

需要在周围环境中行动的代理商可以从对他们的互动可能性或能力的感知中大大受益。在本文中，我们将交互张量（Interaction Tensor）的好处结合在一起，它是一种直接的几何表示形式，可捕获多个对象与场景之间的交互，并具有深度学习显着性，可以快速解析环境中的能力。我们的方法适用于视觉上可感知的3D点云，并可以查询3D场景以获取支撑能力的位置，例如坐或骑，以及与日常对象的交互（例如在哪里挂伞或放置杯子）。至关重要的是，交互描述的本质表现出一键式概括。在众多合成和真实RGB-D场景中进行实验，并经过人类受试者的验证，演示表明，该表示法能够从一个训练示例中预测新颖环境中的可负担得起的候选位置。该方法还允许高度可并行化，多负担的表示形式，并且工作速度很快。学会估计场景显着性的深层神经网络与单次几何表示的结合，与人们对可负担能力估计的计算模型应该在感知上直接且经济的期望非常吻合。

更新日期：2020-07-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11