当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HINT: A Hierarchical Index for Intervals in Main Memory
arXiv - CS - Databases Pub Date : 2021-04-22 , DOI: arxiv-2104.10939
George Christodoulou, Panagiotis Bouros, Nikos Mamoulis

Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose novel and efficient in-memory indexing techniques for intervals, with a focus on interval range queries, which are a basic component of many search and analysis tasks. First, we propose an optimized version of a single-level (flat) domain-partitioning approach, which may have large space requirements due to excessive replication. Then, we propose a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. Novel elements of our techniques include the division of the intervals at each partition into groups based on whether they begin inside or before the partition boundaries, reducing the information stored at each partition to the absolutely necessary, and the effective handling of data sparsity and skew. Experimental results on real and synthetic interval sets of different characteristics show that our approaches are typically one order of magnitude faster than the state-of-the-art.

中文翻译:

提示:主内存中的时间间隔的层次结构索引

索引间隔是一个基本问题,需要广泛的应用。最近在管理主内存中大量间隔的工作集中于重叠连接和时间聚集问题。在本文中,我们提出了一种新颖且高效的内存间隔索引技术,重点是间隔范围查询,这是许多搜索和分析任务的基本组成部分。首先,我们提出了单级(平面)域分区方法的优化版本,由于过度复制,该方法可能具有较大的空间需求。然后,我们提出一种分层分区方法,该方法将每个间隔分配给每个级别最多两个分区,并且具有受控的空间要求。我们技术的新颖元素包括根据每个分区的间隔是在分区边界内部还是在分区边界之前开始将它们划分为组,将存储在每个分区的信息减少到绝对必要的程度,以及有效处理数据稀疏性和偏斜。对具有不同特征的实数和合成区间集的实验结果表明,我们的方法通常比最新技术快一个数量级。
更新日期:2021-04-23
down
wechat
bug