Learning Spherical Convolution for 360° Recognition.,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Spherical Convolution for 360° Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2022-10-04 , DOI: 10.1109/tpami.2021.3113612
Yu-Chuan Su , Kristen Grauman

While 360° cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make visual recognition non-trivial. Ideally, 360° imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, spherical images cannot be projected to a single plane without significant distortion, and existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. We propose to learn a Spherical Convolution Network (SphConv) that translates a planar CNN to the equirectangular projection of 360° images. Given a source CNN for perspective images as input, SphConv learns to reproduce the flat filter outputs on 360° data, sensitive to the varying distortion effects across the viewing sphere. The key benefits are 1) efficient and accurate recognition for 360° images, and 2) the ability to leverage powerful pre-trained networks for perspective images. We further proposes two instantiation of SphConv-Spherical Kernel learns location dependent kernels on the sphere for SphConv, and Kernel Transformer Network learns a functional transformation that generates SphConv kernels from the source CNN. Among the two variants, Kernel Transformer Network has a much lower memory footprint at the cost of higher computational overhead. Validating our approach with multiple source CNNs and datasets, we show that SphConv using KTN successfully preserves the source CNN's accuracy, while offering efficiency, transferability, and scalability to typical image resolutions. We further introduce a spherical Faster R-CNN model based on SphConv and show that we can learn a spherical object detector without any object annotation in 360° images.

中文翻译：

学习 360° 识别的球形卷积。

虽然 360° 相机在视觉、图形和增强现实方面提供了巨大的新可能性，但它们产生的球形图像使视觉识别变得非常重要。理想情况下，360° 图像可以继承已经在透视投影图像上成功训练的深度卷积神经网络 (CNN)。然而，球面图像无法在没有明显失真的情况下投影到单个平面，现有的将 CNN 从透视图转移到球面图像的方法会引入大量计算成本和/或精度下降。我们建议学习将平面 CNN 转换为 360° 图像的等距柱状投影的球形卷积网络 (SphConv)。给定透视图像的源 CNN 作为输入，SphConv 学习在 360° 数据上重现平面滤波器输出，对整个视域的不同失真效果敏感。主要优势是 1) 高效准确地识别 360° 图像，以及 2) 能够利用强大的预训练网络处理透视图像。我们进一步提出了 SphConv 的两个实例 - 球形内核为 SphConv 学习球体上的位置相关内核，内核变换器网络学习从源 CNN 生成 SphConv 内核的功能转换。在这两个变体中，Kernel Transformer Network 以更高的计算开销为代价，拥有更低的内存占用。使用多个源 CNN 和数据集验证我们的方法，我们表明使用 KTN 的 SphConv 成功地保持了源 CNN 的准确性，同时提供典型图像分辨率的效率、可转移性和可扩展性。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11