当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable data series subsequence matching with ULISSE
The VLDB Journal ( IF 2.8 ) Pub Date : 2020-07-04 , DOI: 10.1007/s00778-020-00619-4
Michele Linardi , Themis Palpanas

Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is twofold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk-based index visits and in-memory sequential scans. Our approach supports non-Z-normalized and Z-normalized sequences and can be used with no changes with both Euclidean distance and dynamic time warping, for answering both k-NN and \(\epsilon \)-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches.



中文翻译:

可扩展的数据序列子序列与ULISSE匹配

数据系列相似性搜索是一项重要操作,并且是与数据系列集合相关的多个分析任务和应用程序的核心。尽管数据序列索引支持快速相似性搜索,但是所有现有索引只能回答单个长度的查询(在索引构建时固定),这是一个严重的限制。在这项工作中,我们提出了ULISSE,这是第一种用于回答可变长度相似性搜索查询的数据系列索引结构(在一定范围内)。我们的贡献是双重的。首先,我们介绍一种新颖的表示技术,该技术可以有效,简洁地总结不同长度的多个序列。基于提出的索引,我们描述了一种有效的算法,用于近似和精确相似性搜索,结合了基于磁盘的索引访问和内存顺序扫描。我们的方法支持非Z归一化和Z归一化的序列,并且可以在不改变欧氏距离和动态时间扭曲的情况下使用,用于回答k-NN\(\ epsilon \)范围查询。我们使用几个综合的和真实的数据集实验性地评估了我们的方法。结果表明ULISSE 与竞争方法相比,效率提高了数倍,并且在空间和时间成本上的效率提高了几个数量级。

更新日期:2020-07-05
down
wechat
bug