Perspective-Adaptive Convolutions for Scene Parsing.,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Perspective-Adaptive Convolutions for Scene Parsing.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2019-01-01 , DOI: 10.1109/tpami.2018.2890637
Rui Zhang , Sheng Tang , Yongdong Zhang , Jintao Li , Shuicheng Yan

Many existing scene parsing methods adopt Convolutional Neural Networks with receptive fields of fixed sizes and shapes, which frequently results in inconsistent predictions of large objects and invisibility of small objects. To tackle this issue, we propose perspective-adaptive convolutions to acquire receptive fields of flexible sizes and shapes during scene parsing. Through adding a new perspective regression layer, we can dynamically infer the position-adaptive perspective coefficient vectors utilized to reshape the convolutional patches. Consequently, the receptive fields can be adjusted automatically according to the various sizes and perspective deformations of the objects in scene images. Our proposed convolutions are differentiable to learn the convolutional parameters and perspective coefficients in an end-to-end way without any extra training supervision of object sizes. Furthermore, considering that the standard convolutions lack contextual information and spatial dependencies, we propose a context adaptive bias to capture both local and global contextual information through average pooling on the local feature patches and global feature maps, followed by flexible attentive summing to the convolutional results. The attentive weights are position-adaptive and context-aware, and can be learned through adding an additional context regression layer. Experiments on Cityscapes and ADE20K datasets well demonstrate the effectiveness of the proposed methods.

中文翻译：

用于场景解析的透视自适应卷积。

现有的许多场景解析方法都采用具有固定大小和形状的接收场的卷积神经网络，这常常导致大型物体的预测不一致以及小型物体的隐形性。为了解决这个问题，我们提出了透视自适应卷积，以在场景解析过程中获取具有灵活大小和形状的接收场。通过添加新的透视图回归层，我们可以动态地推断用于重塑卷积斑块的位置自适应透视图系数向量。因此，可以根据场景图像中对象的各种大小和透视变形自动调整接收场。我们提出的卷积可以以端对端的方式学习卷积参数和透视系数，而无需对对象大小进行任何额外的训练监督。此外，考虑到标准卷积缺少上下文信息和空间相关性，我们提出了一种上下文自适应偏差，通过对局部特征补丁和全局特征图进行平均池化，然后对卷积结果进行灵活的专心求和，来捕获本地和全局上下文信息。注意权重是位置自适应和上下文感知的，可以通过添加其他上下文回归层来学习。在Cityscapes和ADE20K数据集上进行的实验很好地证明了所提出方法的有效性。考虑到标准卷积缺少上下文信息和空间相关性，我们提出了一种上下文自适应偏差，通过对局部特征补丁和全局特征图进行平均池化，然后对卷积结果进行灵活的专心求和，来捕获局部和全局上下文信息。注意权重是位置自适应和上下文感知的，可以通过添加其他上下文回归层来学习。在Cityscapes和ADE20K数据集上进行的实验很好地证明了所提出方法的有效性。考虑到标准卷积缺少上下文信息和空间相关性，我们提出了一种上下文自适应偏差，通过对局部特征补丁和全局特征图进行平均池化，然后对卷积结果进行灵活的专心求和，来捕获本地和全局上下文信息。注意权重是位置自适应和上下文感知的，可以通过添加其他上下文回归层来学习。在Cityscapes和ADE20K数据集上进行的实验很好地证明了所提出方法的有效性。然后对卷积结果进行灵活的专心求和。注意权重是位置自适应和上下文感知的，可以通过添加其他上下文回归层来学习。在Cityscapes和ADE20K数据集上进行的实验很好地证明了所提出方法的有效性。然后对卷积结果进行灵活的专心求和。注意权重是位置自适应和上下文感知的，可以通过添加其他上下文回归层来学习。在Cityscapes和ADE20K数据集上进行的实验很好地证明了所提出方法的有效性。

更新日期：2020-03-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>