当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 7-5-2018 , DOI: 10.1109/tpami.2018.2853161
Jae-Pil Heo , Zhe Lin , Sung-Eui Yoon

Approximate K-nearest neighbor search is a fundamental problem in computer science. The problem is especially important for high-dimensional and large-scale data. Recently, many techniques encoding high-dimensional data to compact codes have been proposed. The product quantization and its variations that encode the cluster index in each subspace have been shown to provide impressive accuracy. In this paper, we explore a simple question: is it best to use all the bit-budget for encoding a cluster index? We have found that as data points are located farther away from the cluster centers, the error of estimated distance becomes larger. To address this issue, we propose a novel compact code representation that encodes both the cluster index and quantized distance between a point and its cluster center in each subspace by distributing the bit-budget. We also propose two distance estimators tailored to our representation. We further extend our method to encode global residual distances in the original space. We have evaluated our proposed methods on benchmarks consisting of GIST, VLAD, and CNN features. Our extensive experiments show that the proposed methods significantly and consistently improve the search accuracy over other tested techniques. This result is achieved mainly because our methods accurately estimate distances.

中文翻译:


高维空间中近似 K 最近邻搜索的距离编码乘积量化



近似 K 最近邻搜索是计算机科学中的一个基本问题。该问题对于高维和大规模数据尤其重要。最近,已经提出了许多将高维数据编码为紧凑代码的技术。乘积量化及其对每个子空间中的簇索引进行编码的变体已被证明可以提供令人印象深刻的准确性。在本文中,我们探讨一个简单的问题:使用所有比特预算来编码簇索引是否最好?我们发现,随着数据点距离聚类中心越远,估计距离的误差就会越大。为了解决这个问题,我们提出了一种新颖的紧凑代码表示,通过分配比特预算来对每个子空间中的簇索引和点与其簇中心之间的量化距离进行编码。我们还提出了两个适合我们的表示的距离估计器。我们进一步扩展我们的方法来编码原始空间中的全局残差距离。我们在由 GIST、VLAD 和 CNN 特征组成的基准上评估了我们提出的方法。我们广泛的实验表明,与其他测试技术相比,所提出的方法显着且持续地提高了搜索准确性。取得这一结果主要是因为我们的方法准确地估计了距离。
更新日期:2024-08-22
down
wechat
bug