当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The role of local dimensionality measures in benchmarking nearest neighbor search
Information Systems ( IF 3.0 ) Pub Date : 2021-05-25 , DOI: 10.1016/j.is.2021.101807
Martin Aumüller , Matteo Ceccarello

This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concepts of local intrinsic dimensionality (LID), local relative contrast (RC), and query expansion allow to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of the distribution of these dimensionality measures on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. Interactive visualizations are available on the companion website.1 The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.



中文翻译:

局部维度度量在基准最近邻搜索中的作用

本文重新考虑了最近邻搜索的常用基准测试方法。结果表明,局部固有维数 (LID)、局部相对对比度 (RC) 和查询扩展的概念允许为现实世界的数据集选择各种难度的查询集。此外,根据经验研究了这些维度度量的分布对实现的运行时间性能的影响。为此,引入了不同的可视化概念,可以更细粒度地概述最近邻搜索原则的内部工作原理。配套网站上提供了交互式可视化。1 论文最后评论了最近邻搜索基准测试常用的数据集的多样性。结果表明,此类真实世界的数据集并不多样化:单个数据集的结果可以很好地预测所有其他数据集的结果。

更新日期:2021-06-05
down
wechat
bug