Towards Efficient Scene Understanding via Squeeze Reasoning,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Efficient Scene Understanding via Squeeze Reasoning
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2021-07-30 , DOI: 10.1109/tip.2021.3099369
Xiangtai Li , Xia Li , Ansheng You , Li Zhang , Guangliang Cheng , Kuiyuan Yang , Yunhai Tong , Zhouchen Lin

Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced. Specifically, we build the node graph in the vector where each node represents an abstract semantic concept. The refined feature within the same semantic category results to be consistent, which is thus beneficial for downstream tasks. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks. Despite its simplicity and being lightweight, the proposed strategy allows us to establish the considerable results on different semantic segmentation datasets and shows significant improvements with respect to strong baselines on various other scene understanding tasks including object detection, instance segmentation and panoptic segmentation. Code is available at https://github.com/lxtGH/SFSegNets .

中文翻译：

通过挤压推理实现有效的场景理解

基于图的卷积模型（例如非局部块）已被证明可有效增强卷积神经网络 (CNN) 中的上下文建模能力。然而，它的像素级计算开销令人望而却步，这使得它不适合高分辨率图像。在本文中，我们探讨了上下文图推理的效率，并提出了一种称为 Squeeze Reasoning 的新框架。我们不是在空间图上传播信息，而是首先学习将输入特征压缩到通道全局向量中，并在单个向量中执行推理，这样可以显着降低计算成本。具体来说，我们在向量中构建节点图，其中每个节点代表一个抽象的语义概念。同一语义类别内的细化特征结果一致，因此，这有利于下游任务。我们表明，我们的方法可以模块化为端到端训练块，并且可以轻松插入现有网络。尽管它简单且轻量级，但所提出的策略使我们能够在不同的语义分割数据集上建立可观的结果，并在各种其他场景理解任务（包括对象检测、实例分割和全景分割）的强基线方面显示出显着的改进。代码可在所提出的策略使我们能够在不同的语义分割数据集上建立可观的结果，并在各种其他场景理解任务（包括对象检测、实例分割和全景分割）的强基线方面显示出显着的改进。代码可在所提出的策略使我们能够在不同的语义分割数据集上建立可观的结果，并在各种其他场景理解任务（包括对象检测、实例分割和全景分割）的强基线方面显示出显着的改进。代码可在https://github.com/lxtGH/SFSegNets .

更新日期：2021-08-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>