当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on graph-based methods for similarity searches in metric spaces
Information Systems ( IF 3.0 ) Pub Date : 2020-02-25 , DOI: 10.1016/j.is.2020.101507
Larissa C. Shimomura , Rafael Seidi Oyamada , Marcos R. Vieira , Daniel S. Kaster

Technology development has accelerated the volume growth of complex data, such as images, videos, time series, and georeferenced data. Similarity search is a widely used approach to retrieve complex data, which aims at retrieving similar data according to intrinsic characteristics of the data. Therefore, to facilitate the retrieval of complex data using similarity searches, one needs to organize large collections of data in a way that similar data can be retrieved efficiently. Many access methods were proposed in the literature to speed up similarity data retrieval from large databases. Recently, graph-based methods have emerged as a very efficient alternative for similarity retrieval, with reports indicating those methods outperformed other non-graph-based methods in several scenarios. However, to the best of our knowledge, there is no previous work with experimental analysis on a comprehensive number of graph-based methods using the same search algorithm and execution environment. Our main contribution is a survey on graph-based methods used for similarity searches. We present a review on graph-based methods (types of graphs and search algorithms) as well as a detailed discussion on the applicability of search algorithms (with exact or approximate answers) in each graph type. Our main focus is on static methods in metric spaces. This survey also includes an experimental evaluation of representative graphs implemented in a common platform. We evaluate the relative performance behavior of these graphs concerning the main construction and query parameters for a variety of real-world datasets. We also show results using synthetic datasets evaluating the performance of different graph types according to different dataset features. Our experimental results reinforce the tradeoff between graph construction cost and search performance according to the construction and search parameters.



中文翻译:

度量空间中基于图的相似性搜索方法的调查

技术发展加快了复杂数据(例如图像,视频,时间序列和地理参考数据)的数量增长。相似度搜索是一种广泛使用的检索复杂数据的方法,旨在根据数据的固有特征来检索相似数据。因此,为了便于使用相似性检索来检索复杂数据,需要以可以有效检索相似数据的方式来组织大量数据。文献中提出了许多访问方法,以加快从大型数据库检索相似性数据的速度。最近,基于图的方法已成为一种非常有效的相似性检索替代方法,报告指出,在某些情况下,这些方法的性能优于其他基于非图的方法。但是,据我们所知,以前没有使用相同的搜索算法和执行环境对大量基于图的方法进行实验分析的工作。我们的主要贡献是对用于相似搜索的基于图的方法进行了调查。我们将介绍基于图的方法(图的类型和搜索算法),并对每种图类型中搜索算法(具有精确或近似答案)的适用性进行详细讨论。我们的主要重点是度量空间中的静态方法。该调查还包括在公共平台上实施的代表性图形的实验评估。我们评估这些图有关各种实际数据集的主要结构和查询参数的相对性能行为。我们还将显示使用合成数据集根据不同数据集功能评估不同图形类型的性能的结果。我们的实验结果根据构造和搜索参数加强了图构造成本与搜索性能之间的权衡。

更新日期:2020-04-21
down
wechat
bug