2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.11154
Albert Mosella-Montoro, Javier Ruiz-Hidalgo

Multi-modal fusion has been proved to help enhance the performance of scene classification tasks. This paper presents a 2D-3D fusion stage that combines 3D Geometric features with 2D Texture features obtained by 2D Convolutional Neural Networks. To get a robust 3D Geometric embedding, a network that uses two novel layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution, aims to learn a more robust geometric descriptor of the scene combining two different neighbourhoods: one in the Euclidean space and the other in the Feature space. The second proposed layer, Nearest Voxel Pooling, improves the performance of the well-known Voxel Pooling. Experimental results, using NYU-Depth-v2 and SUN RGB-D datasets, show that the proposed method outperforms the current state-of-the-art in RGB-D indoor scene classification tasks.

中文翻译：

使用多邻域图卷积进行 RGB-D 室内场景分类的 2D-3D 几何融合网络

多模态融合已被证明有助于提高场景分类任务的性能。本文提出了一个 2D-3D 融合阶段，它将 3D 几何特征与通过 2D 卷积神经网络获得的 2D 纹理特征相结合。为了获得强大的 3D 几何嵌入，提出了一个使用两个新层的网络。第一层，多邻域图卷积，旨在学习一个更健壮的场景几何描述符，结合两个不同的邻域：一个在欧几里德空间中，另一个在特征空间中。第二个提议的层，Nearest Voxel Pooling，提高了著名的 Voxel Pooling 的性能。使用 NYU-Depth-v2 和 SUN RGB-D 数据集的实验结果表明，所提出的方法在 RGB-D 室内场景分类任务中优于当前最先进的方法。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文