Fine-grained visual classification via multilayer bilinear pooling with object localization,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fine-grained visual classification via multilayer bilinear pooling with object localization
The Visual Computer ( IF 3.5 ) Pub Date : 2021-01-09 , DOI: 10.1007/s00371-020-02052-8
Ming Li , Lin Lei , Hao Sun , Xiao Li , Gangyao Kuang

Fine-grained visual classification is a challenging task in the computer vision field. How to explore discriminative features is vital for classification. As one crucial step, exactly object localization is able to eliminate the background noises and highlight interesting objects at the same time. However, some current methods usually use bounding boxes to locate objects, that are not suitable when the poses of objects change. Furthermore, it has been demonstrated that deep features have strong feature representation capability, especially the bilinear pooling features, which achieved superior performance in fine-grained visual classification tasks. However, the bilinear features, which captured only from the last convolutional layer, have limited discriminability, especially when dealing with small-scale objects. In this paper, we propose a multilayer bilinear pooling model combined with object localization. First, a flexible and scalable object localization module is utilized to locate the interesting object in an image instead of using bounding boxes. Then the refined features are obtained by highlighting object region and suppressing background noises. While the multilayer bilinear pooling, which exploits the complementarity between different layers, is used for further extracting more discriminative features. Experiment results on three public datasets show that our proposed method can achieve competitive performance compared with several state-of-the-art methods.

中文翻译：

通过带有对象定位的多层双线性池进行细粒度的视觉分类

细粒度的视觉分类是计算机视觉领域的一项艰巨任务。如何探索区分特征对于分类至关重要。至关重要的一步是，精确的对象定位能够消除背景噪音并同时突出显示有趣的对象。但是，当前的一些方法通常使用边界框来定位对象，当对象的姿势发生变化时，这是不合适的。此外，已经证明，深层特征具有强大的特征表示能力，尤其是双线性合并特征，在细粒度的视觉分类任务中实现了卓越的性能。但是，仅从最后一个卷积层捕获的双线性特征具有有限的可分辨性，尤其是在处理小尺寸物体时。在本文中，我们提出了结合对象定位的多层双线性池模型。首先，灵活而可扩展的对象定位模块用于在图像中定位感兴趣的对象，而不是使用边界框。然后，通过突出显示对象区域并抑制背景噪声来获得精细特征。多层双线性池利用了不同层之间的互补性，可用于进一步提取更具区分性的特征。在三个公共数据集上的实验结果表明，与几种最新方法相比，我们提出的方法可以实现竞争性能。灵活且可扩展的对象定位模块用于在图像中定位感兴趣的对象，而不是使用边界框。然后，通过突出显示对象区域并抑制背景噪声来获得精细特征。多层双线性池利用了不同层之间的互补性，可用于进一步提取更具区分性的特征。在三个公共数据集上的实验结果表明，与几种最新方法相比，我们提出的方法可以实现竞争性能。灵活且可扩展的对象定位模块用于在图像中定位感兴趣的对象，而不是使用边界框。然后，通过突出显示对象区域并抑制背景噪声来获得精细特征。多层双线性池利用了不同层之间的互补性，可用于进一步提取更具区分性的特征。在三个公共数据集上的实验结果表明，与几种最新方法相比，我们提出的方法可以实现竞争性能。

更新日期：2021-01-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>