Learning Granularity-Aware Convolutional Neural Network for Fine-Grained Visual Classification,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Granularity-Aware Convolutional Neural Network for Fine-Grained Visual Classification
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02788
Jianwei Song, Ruoyu Yang

Locating discriminative parts plays a key role in fine-grained visual classification due to the high similarities between different objects. Recent works based on convolutional neural networks utilize the feature maps taken from the last convolutional layer to mine discriminative regions. However, the last convolutional layer tends to focus on the whole object due to the large receptive field, which leads to a reduced ability to spot the differences. To address this issue, we propose a novel Granularity-Aware Convolutional Neural Network (GA-CNN) that progressively explores discriminative features. Specifically, GA-CNN utilizes the differences of the receptive fields at different layers to learn multi-granularity features, and it exploits larger granularity information based on the smaller granularity information found at the previous stages. To further boost the performance, we introduce an object-attentive module that can effectively localize the object given a raw image. GA-CNN does not need bounding boxes/part annotations and can be trained end-to-end. Extensive experimental results show that our approach achieves state-of-the-art performances on three benchmark datasets.

中文翻译：

学习用于细粒度视觉分类的粒度感知卷积神经网络

由于不同对象之间的高度相似性，因此定位可区分的部分在细粒度的视觉分类中起着关键作用。基于卷积神经网络的最新工作利用从最后一个卷积层获取的特征图来挖掘判别区域。但是，由于大的接收场，最后的卷积层倾向于集中在整个对象上，这导致发现差异的能力降低。为了解决这个问题，我们提出了一种新颖的粒度感知卷积神经网络（GA-CNN），该网络逐步探索了判别特征。具体而言，GA-CNN利用不同层的接收场的差异来学习多粒度特征，并基于先前阶段发现的较小粒度信息来利用较大粒度信息。为了进一步提高性能，我们引入了一个关注对象的模块，该模块可以在给定原始图像的情况下有效地定位对象。GA-CNN不需要边界框/零件注释，并且可以端到端进行训练。大量的实验结果表明，我们的方法在三个基准数据集上实现了最先进的性能。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文