Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search
arXiv - CS - Databases Pub Date : 2020-06-20 , DOI: arxiv-2006.11459
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim

Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has studied approximate similarity search techniques. We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. Although data series differ from generic multidimensional vectors (series usually exhibit correlation between neighboring values), our results show that data series techniques answer approximate %similarity queries with strong guarantees and an excellent empirical performance, on data series and vectors alike. These techniques outperform the state-of-the-art approximate techniques for vectors when operating on disk, and remain competitive in memory.

中文翻译：

Lernaean Hydra 的回归：数据系列近似相似性搜索的实验评估

数据系列是存在于众多领域中的一种特殊类型的多维数据，其中相似性搜索是数据系列文献中已广泛研究的关键操作。同时，多维社区研究了近似相似性搜索技术。我们提出了一种相似性搜索技术的分类，它协调了这两个领域中使用的术语，我们描述了对数据系列索引技术的修改，使它们能够回答具有质量保证的近似相似性查询，并且我们进行了彻底的实验评估以比较近似相似性搜索技术在一个统一的框架下，在内存和磁盘上的合成和真实数据集上。尽管数据系列不同于一般的多维向量（系列通常表现出相邻值之间的相关性），我们的结果表明，数据系列技术可以在数据系列和向量上回答近似 % 相似性查询，并具有强大的保证和出色的经验性能。这些技术在磁盘上运行时优于最先进的向量近似技术，并在内存中保持竞争力。

更新日期：2020-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文