当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coconut: sortable summarizations for scalable indexes over static and streaming data series
The VLDB Journal ( IF 4.2 ) Pub Date : 2019-09-25 , DOI: 10.1007/s00778-019-00573-w
Haridimos Kondylakis , Niv Dayan , Kostas Zoumpatianos , Themis Palpanas

Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. To address this problem, we present Coconut, the first data series index based on sortable summarizations and the first efficient solution for indexing and querying streaming series. The first innovation in Coconut is an inverted, sortable data series summarization that organizes data series based on a z-order curve, keeping similar series close to each other in the sorted order. As a result, Coconut is able to use bulk loading and updating techniques that rely on sorting to quickly build and maintain a contiguous index using large sequential disk I/Os. We then explore prefix-based and median-based splitting policies for bottom-up bulk loading, showing that median-based splitting outperforms the state of the art, ensuring that all nodes are densely populated. Finally, we explore the impact of sortable summarizations on variable-sized window queries, showing that they can be supported in the presence of updates through efficient merging of temporal partitions. Overall, we show analytically and empirically that Coconut dominates the state-of-the-art data series indexes in terms of construction speed, query speed, and storage costs.

中文翻译:

Coconut:可排序的摘要,可用于静态和流数据系列的可伸缩索引

许多现代应用程序产生大量需要分析的数据序列流,需要高效的相似性搜索操作。但是,就性能或存储成本而言,用于此目的的最新数据系列索引不能很好地扩展到海量数据集。我们将问题定位在以下事实上:无法对用于索引的数据系列的现有摘要进行排序,而将相似的数据系列按排序顺序彼此靠近。为了解决这个问题,我们提出了Coconut,这是第一个基于可排序摘要的数据系列索引,也是第一个对流序列进行索引和查询的有效解决方案。Coconut的第一个创新是对数据系列进行了可逆的,可排序的汇总,该汇总基于z阶曲线组织了数据系列,将相似的序列按排序顺序彼此靠近。结果,Coconut能够使用依赖于排序的批量加载和更新技术,以使用大型顺序磁盘I / O快速构建和维护连续索引。然后,我们探索了用于自下而上的批量加载的基于前缀和基于中间值的拆分策略,显示了基于中间值的拆分优于现有技术,从而确保了所有节点的密集分布。最后,我们探讨了可排序汇总对可变大小窗口查询的影响,表明可通过有效合并时间分区来在存在更新的情况下支持它们。总体而言,我们从分析和经验上证明,在构建速度,查询速度和存储成本方面,Coconut在最新数据系列索引中占主导地位。Coconut能够使用依赖于排序的批量加载和更新技术,以使用大型顺序磁盘I / O快速构建和维护连续索引。然后,我们探索了用于自下而上的批量加载的基于前缀和基于中间值的拆分策略,显示了基于中间值的拆分优于现有技术,从而确保了所有节点的密集分布。最后,我们探讨了可排序汇总对可变大小窗口查询的影响,表明可通过有效合并时间分区来在存在更新的情况下支持它们。总体而言,我们从分析和经验上证明,在构建速度,查询速度和存储成本方面,Coconut在最新数据系列索引中占主导地位。Coconut能够使用依赖于排序的批量加载和更新技术,以使用大型顺序磁盘I / O快速构建和维护连续索引。然后,我们探索了用于自下而上的批量加载的基于前缀和基于中间值的拆分策略,显示了基于中间值的拆分优于现有技术,从而确保了所有节点的密集分布。最后,我们探讨了可排序汇总对可变大小窗口查询的影响,表明可通过有效合并时间分区来在存在更新的情况下支持它们。总体而言,我们从分析和经验上证明,在构建速度,查询速度和存储成本方面,Coconut在最新数据系列索引中占主导地位。
更新日期:2019-09-25
down
wechat
bug