AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2021-02-08 , DOI: 10.1109/tip.2021.3055617
Yifeng Ding , Zhanyu Ma , Shaoguo Wen , Jiyang Xie , Dongliang Chang , Zhongwei Si , Ming Wu , Haibin Ling

Classifying the sub-categories of an object from the same super-category ( e.g. , bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information ( e.g. , color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https://github.com/PRIS-CV/AP-CNN_Pytorch-master .

中文翻译：

AP-CNN：用于细粒度视觉分类的弱监督注意力金字塔卷积神经网络

从相同的超类别中对对象的子类别进行分类（例如（鸟类和汽车等）在细粒度的视觉分类（FGVC）中高度依赖于区分性特征表示和准确的区域定位。现有方法主要集中于从高级功能中提取信息。相比之下，在本文中，我们展示了通过整合低层信息（例如，颜色，边缘连接，纹理图案），可以通过增强的特征表示和准确定位的区分区域来提高性能。我们的解决方案称为注意力金字塔卷积神经网络（AP-CNN），包括1）具有自上而下的特征路径和自下而上的注意力路径的双路径层次结构，因此可以学习高级语义和低级语义详细的功能表示，以及2）带有ROI引导的dropblock和ROI引导的放大操作的ROI引导的优化策略，该策略通过增强区分性区域和消除背景噪声来优化功能。可以对端到端的AP-CNN进行端到端训练，而无需任何其他边界框/零件注释。在三个经过广泛测试的FGVC数据集（CUB-200-2011，斯坦福汽车，和FGVC-Aircraft）证明了我们的方法达到了最先进的性能。可以在以下位置找到模型和代码https://github.com/PRIS-CV/AP-CNN_Pytorch-master 。

更新日期：2021-02-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>