当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FITing-Tree: A Data-aware Index Structure
arXiv - CS - Databases Pub Date : 2018-01-30 , DOI: arxiv-1801.10207
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska

Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a modern DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space available to store new data or process existing data. In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps determine an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.

中文翻译:

FITing-Tree:数据感知索引结构

索引结构是 DBA 用来提高分析和事务工作负载性能的最重要工具之一。但是,在大型数据集上构建多个索引通常会变得令人望而却步并消耗宝贵的系统资源。事实上,最近的一项研究表明,作为 TPC-C 基准测试的一部分创建的索引可以占现代 DBMS 中可用总内存的 55%。这种开销消耗了宝贵且昂贵的主内存,并限制了可用于存储新数据或处理现有数据的空间量。在本文中,我们提出了 FITing-Tree,这是一种新的学习索引形式,它使用分段线性函数,并在构造时指定一个有界误差。此错误旋钮提供了一个可调参数,允许 DBA 通过平衡查找性能和空间消耗来将索引拟合到数据集和工作负载。为了解决这个问题,我们提供了一个成本模型,在给定 (1) 查找延迟要求(例如,500ns)或 (2) 存储预算(例如,100MB)的情况下,该模型有助于确定适当的错误参数。使用各种真实世界的数据集,我们表明我们的索引能够提供与完整索引结构相当的性能,同时将存储空间减少几个数量级。
更新日期:2020-03-26
down
wechat
bug