当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Data Series Subsequence Matching with ULISSE
arXiv - CS - Databases Pub Date : 2020-09-22 , DOI: arxiv-2009.10373
Michele Linardi and Themis Palpanas

Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is two-fold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk based index visits and in-memory sequential scans. Our approach supports non Z-normalized and Z-normalized sequences, and can be used with no changes with both Euclidean Distance and Dynamic Time Warping, for answering both k-NN and epsilon-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches. (Paper published in VLDBJ 2020)

中文翻译:

使用 ULISSE 进行可扩展的数据序列子序列匹配

数据系列相似性搜索是一项重要的操作,是与数据系列集合相关的若干分析任务和应用程序的核心。尽管数据系列索引可以实现快速相似性搜索,但所有现有索引只能回答单一长度的查询(在索引构建时固定),这是一个严重的限制。在这项工作中,我们提出了 ULISSE,这是第一个数据系列索引结构,旨在回答可变长度(在某个范围内)的相似性搜索查询。我们的贡献是双重的。首先,我们引入了一种新颖的表示技术,该技术有效且简洁地总结了多个不同长度的序列。基于所提出的索引,我们描述了近似和精确相似性搜索的有效算法,结合了基于磁盘的索引访问和内存中顺序扫描。我们的方法支持非 Z 归一化和 Z 归一化序列,并且可以在不改变欧几里德距离和动态时间扭曲的情况下使用,用于回答 k-NN 和 epsilon 范围查询。我们使用几个合成数据集和真实数据集对我们的方法进行了实验评估。结果表明,与竞争方法相比,ULISSE 在空间和时间成本方面的效率要高出数倍,甚至高达几个数量级。(论文发表于 VLDBJ 2020)与竞争方法相比,在空间和时间成本方面效率高出几个数量级。(论文发表于 VLDBJ 2020)与竞争方法相比,在空间和时间成本方面效率高出几个数量级。(论文发表于 VLDBJ 2020)
更新日期:2020-09-23
down
wechat
bug