Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads
arXiv - CS - Databases Pub Date : 2020-06-23 , DOI: arxiv-2006.13282
Jialin Ding and Vikram Nathan and Mohammad Alizadeh and Tim Kraska

Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes.

中文翻译：

海啸：相关数据和倾斜工作负载的学习多维索引

基于谓词过滤数据是任何现代数据仓库最基本的操作之一。加速过滤器表达式执行的技术包括聚集索引、专门的排序顺序（例如 Z 顺序）、多维索引，以及用于高选择性查询的二级索引。然而，这些方案很难调整，而且它们的性能也不一致。最近关于学习多维索引的工作引入了为特定数据集和工作负载自动优化索引的想法。但是，该工作的性能会因存在相关数据和倾斜查询工作负载而受到影响，这两种情况在实际应用程序中都很常见。在本文中，我们介绍了海啸，

更新日期：2020-06-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文