ACTNET: End-to-End Learning of Feature Activations and Multi-stream Aggregation for Effective Instance Image Retrieval,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ACTNET: End-to-End Learning of Feature Activations and Multi-stream Aggregation for Effective Instance Image Retrieval
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-02-10 , DOI: 10.1007/s11263-021-01444-0
Syed Sameed Husain , Eng-Jon Ong , Miroslaw Bober

We propose a novel CNN architecture called ACTNET for robust instance image retrieval from large-scale datasets. Our key innovation is a learnable activation layer designed to improve the signal-to-noise ratio of deep convolutional feature maps. Further, we introduce a controlled multi-stream aggregation, where complementary deep features from different convolutional layers are optimally transformed and balanced using our novel activation layers, before aggregation into a global descriptor. Importantly, the learnable parameters of our activation blocks are explicitly trained, together with the CNN parameters, in an end-to-end manner minimising triplet loss. This means that our network jointly learns the CNN filters and their optimal activation and aggregation for retrieval tasks. To our knowledge, this is the first time parametric functions have been used to control and learn optimal multi-stream aggregation. We conduct an in-depth experimental study on three non-linear activation functions: Sine-Hyperbolic, Exponential and modified Weibull, showing that while all bring significant gains the Weibull function performs best thanks to its ability to equalise strong activations. The results clearly demonstrate that our ACTNET architecture significantly enhances the discriminative power of deep features, improving significantly over the state-of-the-art retrieval results on all datasets.

中文翻译：

ACTNET：功能激活和多流聚合的端到端学习，可有效检索实例图像

我们提出了一种新颖的CNN体系结构，称为ACTNET，用于从大规模数据集中检索稳健的实例图像。我们的关键创新是可学习的激活层，旨在改善深度卷积特征图的信噪比。此外，我们引入了受控的多流聚合，其中在聚合为全局描述符之前，使用我们新颖的激活层对来自不同卷积层的互补深度特征进行了最佳转换和平衡。重要的是，我们的激活块的可学习参数与CNN参数一起以端到端的方式进行了显式训练，以最大限度地减少三元组损失。这意味着我们的网络可以共同学习CNN过滤器及其针对检索任务的最佳激活和聚合。据我们所知，这是参数函数首次用于控制和学习最佳多流聚合。我们对三种非线性激活函数进行了深入的实验研究：正弦双曲函数，指数函数和改进的Weibull，表明尽管所有这些都带来了显着的收益，但是Weibull函数由于其均衡强激活的能力而表现最佳。结果清楚地表明，我们的ACTNET体系结构显着增强了深层功能的判别能力，与所有数据集上最新的检索结果相比，都有显着提高。表明，尽管所有这些都带来了显着的收益，但由于威布尔函数具有均衡强大激活的能力，因此其性能最佳。结果清楚地表明，我们的ACTNET体系结构显着增强了深层功能的判别能力，与所有数据集上最新的检索结果相比，都有显着提高。表明，尽管所有这些都带来了显着的收益，但由于威布尔函数具有均衡强大激活的能力，因此其性能最佳。结果清楚地表明，我们的ACTNET体系结构显着增强了深层功能的判别能力，与所有数据集上最新的检索结果相比，都有显着提高。

更新日期：2021-02-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11