当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coarse2Fine: a two-stage training method for fine-grained visual classification
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2021-02-25 , DOI: 10.1007/s00138-021-01180-y
Amir Erfan Eshratifar , David Eigen , Michael Gormish , Massoud Pedram

Small inter-class and large intra-class variations are the key challenges in fine-grained visual classification. Objects from different classes share visually similar structures, and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g., bird’s beak or car’s headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the attended feature maps to the input space. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. Besides, we propose an initialization method for the attention weights. Our experiments show that Coarse2Fine reduces the classification error by up to 5.1% on common fine-grained datasets.



中文翻译:

Coarse2Fine:用于精细视觉分类的两阶段训练方法

小类间差异和大类内差异是细粒度视觉分类的主要挑战。来自不同类别的对象共享视觉上相似的结构,并且同一类别中的对象可以具有不同的姿势和视点。因此,正确提取具有区别性的局部特征(例如鸟嘴或汽车前灯)至关重要。最近在此问题上获得的大多数成功都基于注意力模型,该模型可以定位并参与本地区分对象部分。在这项工作中,我们提出了一种针对视觉注意力网络的训练方法Coarse2Fine,该方法创建了从照看的特征图到输入空间的可区分路径。Coarse2Fine从参与的特征映射到原始图像中的信息区域学习逆映射功能,这将引导注意力图更好地参与细粒度的功能。此外,我们提出了注意力权重的初始化方法。我们的实验表明,Coarse2Fine在常见的细粒度数据集上可将分类错误降低多达5.1%。

更新日期:2021-02-25
down
wechat
bug