Alignment Enhancement Network for Fine-grained Visual Categorization,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Alignment Enhancement Network for Fine-grained Visual Categorization
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2021-04-01 , DOI: 10.1145/3446208
Yutao Hu ₁ , Xuhui Liu ₁ , Baochang Zhang ₂ , Jungong Han ₃ , Xianbin Cao ₄

Affiliation

Fine-grained visual categorization (FGVC) aims to automatically recognize objects from different sub-ordinate categories. Despite attracting considerable attention from both academia and industry, it remains a challenging task due to subtle visual differences among different classes. Cross-layer feature aggregation and cross-image pairwise learning become prevailing in improving the performance of FGVC by extracting discriminative class-specific features. However, they are still inefficient to fully use the cross-layer information based on the simple aggregation strategy, while existing pairwise learning methods also fail to explore long-range interactions between different images. To address these problems, we propose a novel Alignment Enhancement Network (AENet), including two-level alignments, Cross-layer Alignment (CLA) and Cross-image Alignment (CIA). The CLA module exploits the cross-layer relationship between low-level spatial information and high-level semantic information, which contributes to cross-layer feature aggregation to improve the capacity of feature representation for input images. The new CIA module is further introduced to produce the aligned feature map, which can enhance the relevant information as well as suppress the irrelevant information across the whole spatial region. Our method is based on an underlying assumption that the aligned feature map should be closer to the inputs of CIA when they belong to the same category. Accordingly, we establish Semantic Affinity Loss to supervise the feature alignment within each CIA block. Experimental results on four challenging datasets show that the proposed AENet achieves the state-of-the-art results over prior arts.

中文翻译：

用于细粒度视觉分类的对齐增强网络

细粒度视觉分类（FGVC）旨在自动识别来自不同从属类别的对象。尽管引起了学术界和工业界的极大关注，但由于不同类别之间的细微视觉差异，它仍然是一项具有挑战性的任务。跨层特征聚合和跨图像成对学习通过提取有区别的类特定特征来提高 FGVC 的性能。然而，基于简单的聚合策略，它们在充分利用跨层信息方面仍然效率低下，而现有的成对学习方法也未能探索不同图像之间的远程交互。为了解决这些问题，我们提出了一种新颖的对齐增强网络（AENet），包括两级对齐，跨层对齐 (CLA) 和跨图像对齐 (CIA)。CLA模块利用低层空间信息和高层语义信息之间的跨层关系，有助于跨层特征聚合，提高输入图像的特征表示能力。进一步引入了新的 CIA 模块来生成对齐的特征图，它可以增强相关信息并抑制整个空间区域的不相关信息。我们的方法基于一个基本假设，即当对齐的特征图属于同一类别时，它们应该更接近 CIA 的输入。因此，我们建立语义亲和力损失来监督每个 CIA 块内的特征对齐。

更新日期：2021-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文