S2-aware network for visual recognition,Signal Processing: Image Communication

当前位置： X-MOL 学术 › Signal Process. Image Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

S2-aware network for visual recognition
Signal Processing: Image Communication ( IF 3.5 ) Pub Date : 2021-08-30 , DOI: 10.1016/j.image.2021.116458
Wenyi Zhao ₁ , Huihua Yang _{1,

2} , Xipeng Pan ₂ , Lingqiao Li ₂

Affiliation

Capturing the comprehensive information of various sizes and shapes of images in the same convolution layer is typically a challenging task in computer vision. There are two main kinds of methods for capturing those features. The first uses the inception structure and its variants. The second utilizes larger convolution kernels on specific layers or stacks with more convolution blocks. However, these methods can result in computationally intensive or vanishing gradients. In this paper, to accommodate feature distributions with different sizes, shapes and reduce computational cost, we propose a width- and depth-aware module named the WD-module to match feature distributions. Moreover, the proposed WD-module consumes less computational cost and parameters compared with traditional residual convolution layers. To verify the effectiveness of our proposed method, a size- and shape-aware backbone network named S²A-Net was built, which was obtained by stacking the WD-modules. By visualizing heat maps and features, the proposed S²A-Net can adapt to objects with different sizes and shapes in visual recognition tasks and learn more comprehensive characteristics. Experimental results show that the proposed method has higher accuracy in image recognition and outperforms other state-of-the-art networks with the same numbers of layers.

中文翻译：

用于视觉识别的 S2 感知网络

在同一卷积层中捕获各种大小和形状的图像的综合信息通常是计算机视觉中的一项具有挑战性的任务。有两种主要的方法来捕捉这些特征。第一种使用初始结构及其变体。第二种在具有更多卷积块的特定层或堆栈上使用更大的卷积核。然而，这些方法可能会导致计算密集或梯度消失。在本文中，为了适应具有不同大小、形状的特征分布并降低计算成本，我们提出了一个名为 WD-module 的宽度和深度感知模块来匹配特征分布。此外，与传统的残差卷积层相比，所提出的 WD 模块消耗更少的计算成本和参数。² A-Net 被构建，它是通过堆叠 WD-modules 获得的。通过可视化热图和特征，所提出的 S ² A-Net 可以在视觉识别任务中适应不同大小和形状的对象，并学习更全面的特征。实验结果表明，所提出的方法在图像识别方面具有更高的准确率，并且在相同层数的情况下优于其他最先进的网络。

更新日期：2021-09-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>