Multilayer feature descriptors fusion CNN models for fine-grained visual recognition,Computer Animation and Virtual Worlds

当前位置： X-MOL 学术 › Comput. Animat. Virtual Worlds › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multilayer feature descriptors fusion CNN models for fine-grained visual recognition
Computer Animation and Virtual Worlds ( IF 0.9 ) Pub Date : 2019-05-01 , DOI: 10.1002/cav.1897
Yong Hou ₁ , Hangzai Luo ₁ , Wanqing Zhao ₁ , Xiang Zhang ₁ , Jun Wang ₁ , Jinye Peng ₁

Affiliation

Fine‐grained image classification is a challenging topic in the field of computer vision. General models based on first‐order local features cannot achieve acceptable performance because the features are not so efficient in capturing fine‐grained difference. A bilinear convolutional neural network (CNN) model exhibits that a second‐order statistical feature is more efficient in capturing fine‐grained difference than a first‐order local feature. However, this framework only considers the extraction of a second‐order feature descriptor, using a single convolutional layer. The potential effective classification features of other convolutional layers are ignored, resulting in loss of recognition accuracy. In this paper, a multilayer feature descriptors fusion CNN model is proposed. It fully considers the second‐order feature descriptors and the first‐order local feature descriptor generated by different layers. Experimental verification was carried out on fine‐grained classification benchmark data sets, CUB‐200‐2011, Stanford Cars, and FGVC‐aircraft. Compared with the bilinear CNN model, the proposed method has improved accuracy by 0.8%, 1.1%, and 5.5%. Compared with the compact bilinear pooling model, there is an accuracy increase of 0.64%, 1.63%, and 1.45%, respectively. In addition, the proposed model effectively uses multiple 1×1 convolution kernels to reduce dimension. The experimental results show that the multilayer low‐dimensional second‐order feature descriptors fusion model has comparable recognition accuracy of the original model.

中文翻译：

用于细粒度视觉识别的多层特征描述符融合 CNN 模型

细粒度图像分类是计算机视觉领域的一个具有挑战性的课题。基于一阶局部特征的通用模型无法达到可接受的性能，因为这些特征在捕获细粒度差异方面效率不高。双线性卷积神经网络 (CNN) 模型表明，二阶统计特征在捕获细粒度差异方面比一阶局部特征更有效。然而，该框架仅考虑使用单个卷积层提取二阶特征描述符。忽略了其他卷积层潜在的有效分类特征，导致识别精度的损失。在本文中，提出了一种多层特征描述符融合CNN模型。它充分考虑了不同层生成的二阶特征描述符和一阶局部特征描述符。在细粒度分类基准数据集、CUB-200-2011、Stanford Cars 和 FGVC-aircraft 上进行了实验验证。与双线性CNN模型相比，所提方法的准确率分别提高了0.8%、1.1%和5.5%。与紧凑型双线性池化模型相比，准确率分别提高了 0.64%、1.63% 和 1.45%。此外，所提出的模型有效地使用了多个 1×1 卷积核来降维。实验结果表明，多层低维二阶特征描述符融合模型具有与原始模型相当的识别精度。在细粒度分类基准数据集、CUB-200-2011、Stanford Cars 和 FGVC-aircraft 上进行了实验验证。与双线性CNN模型相比，所提方法的准确率分别提高了0.8%、1.1%和5.5%。与紧凑型双线性池化模型相比，准确率分别提高了 0.64%、1.63% 和 1.45%。此外，所提出的模型有效地使用了多个 1×1 卷积核来降维。实验结果表明，多层低维二阶特征描述符融合模型具有与原始模型相当的识别精度。在细粒度分类基准数据集、CUB-200-2011、Stanford Cars 和 FGVC-aircraft 上进行了实验验证。与双线性CNN模型相比，所提方法的准确率分别提高了0.8%、1.1%和5.5%。与紧凑型双线性池化模型相比，准确率分别提高了 0.64%、1.63% 和 1.45%。此外，所提出的模型有效地使用了多个 1×1 卷积核来降维。实验结果表明，多层低维二阶特征描述符融合模型具有与原始模型相当的识别精度。和 5.5%。与紧凑型双线性池化模型相比，准确率分别提高了 0.64%、1.63% 和 1.45%。此外，所提出的模型有效地使用了多个 1×1 卷积核来降维。实验结果表明，多层低维二阶特征描述符融合模型具有与原始模型相当的识别精度。和 5.5%。与紧凑型双线性池化模型相比，准确率分别提高了 0.64%、1.63% 和 1.45%。此外，所提出的模型有效地使用了多个 1×1 卷积核来降维。实验结果表明，多层低维二阶特征描述符融合模型具有与原始模型相当的识别精度。

更新日期：2019-05-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11