Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection
Information Systems ( IF 3.0 ) Pub Date : 2020-02-13 , DOI: 10.1016/j.is.2020.101506
Lucia Vadicamo , Claudio Gennaro , Fabrizio Falchi , Edgar Chávez , Richard Connor , Giuseppe Amato

Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations.

To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references.

Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased.

An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects.

中文翻译：

通过局部嵌入进行重新排序：具有基于置换的索引和nSimplex投影的用例

近似最近邻（ANN）搜索是用于在大规模数据集中搜索本质上高维对象的流行范例。最近，基于置换的ANN方法由于在通用度量空间类中的通用性而引起了广泛的关注。在这种方法中，整个数据库按到查询的排列距离进行排序。通常，排列允许有效选择结果的候选集，但是通常要实现较高的查全率或准确性，必须使用原始度量和数据来检查该集合。这可能导致大量数据库被调用，以及许多昂贵的距离计算。

为了减少度量计算的数量和访问的数据库元素的数量，我们在此建议使用nSimplex投影基于本地嵌入进行重新排序。nSimplex投影从具有n点属性的度量空间中的对象生成欧几里得向量。从到一组参考对象的距离获得映射，并且原始度量可以通过共享同一组参考的对象的欧几里得距离来进行下界和上界。

对于大量数据库或昂贵的度量功能，我们的方法特别有利。我们在第一阶段重复使用在置换中计算出的距离，因此索引的内存占用量不会增加。

本文对我们的方法进行了广泛的实验评估，即使在数以亿计的对象集上也显示了出色的结果。

更新日期：2020-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11