A Comprehensive Study on VLAD,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Comprehensive Study on VLAD
Neural Processing Letters ( IF 2.6 ) Pub Date : 2021-04-03 , DOI: 10.1007/s11063-021-10502-0
Xin Li , Lei Zhang , Zhiping Jian , Liyun Zuo

Recently, the vector of locally aggregated descriptor (VLAD) has shown its great effectiveness in diverse computer vision tasks including image retrieval, Scene classification, and action recognition. Its great success stems from its powerful representation ability and computational efficiency. However, it remains unclear about its theoretical foundation and how it is connected to basic while important algorithms, e.g., the bag-of-words model and match kernels, and how its performance is affected by parameter configurations, e.g., normalization and pooling, which are also widely used in state-of-the-art algorithms based on local features. In this paper, with an aim to achieve the full capacity of VLAD, we conduct a comprehensive and in-depth study from both theoretical analysis and experimental practice perspectives. As a theoretical contribution, we provide a new formulation of VLAD via match kernels, which serves to connect VLAD with existing important encoding methods based on local features. As a contribution to the practical use of VLAD, we comprehensively investigate the roles and effects of the two widely-used operations in local feature encoding: normalization and pooling. To the best of our knowledge, our work provides the first comprehensive study on VLAD, which will not only enable a full understanding of it but also provide an important guidance for state-of-the-art algorithms based on local features. We have conducted extensive experiments on three benchmark datasets: Scene-15, Caltech 101 and PPMI for both image classification and action recognition.

中文翻译：

VLAD的综合研究

最近，局部聚集描述符向量（VLAD）已显示出其在各种计算机视觉任务（包括图像检索，场景分类和动作识别）中的巨大功效。它的巨大成功源于其强大的表示能力和计算效率。但是，尚不清楚其理论基础，如何与重要的重要算法（例如词袋模型和匹配内核）连接以及其性能如何受到参数配置（例如）的影响，规范化和池化，它们也广泛用于基于局部特征的最新算法中。在本文中，为了实现VLAD的全部功能，我们从理论分析和实验实践的角度进行了全面而深入的研究。作为理论上的贡献，我们通过匹配内核提供了VLAD的新公式，该公式将VLAD与基于局部特征的现有重要编码方法相连接。为了对VLAD的实际使用做出贡献，我们全面研究了两种广泛使用的操作在局部特征编码中的作用和效果：规范化和池化。据我们所知，我们的工作是对VLAD的首次全面研究，这不仅可以使您对其有一个全面的了解，而且可以为基于局部特征的最新算法提供重要的指导。我们已经在三个基准数据集上进行了广泛的实验：Scene-15，Caltech 101和PPMI，用于图像分类和动作识别。

更新日期：2021-04-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11