当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Pluggable Learned Index Method via Sampling and Gap Insertion
arXiv - CS - Databases Pub Date : 2021-01-04 , DOI: arxiv-2101.00808
Yaliang Li, Daoyuan Chen, Bolin Ding, Kai Zeng, Jingren Zhou

Database indexes facilitate data retrieval and benefit broad applications in real-world systems. Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution and incorporate such information into the learning of indexes, which leads to promising performance improvements. However, the "learning" process of learned indexes is still under-explored. In this paper, we propose a formal machine learning based framework to quantify the index learning objective, and study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes. With the guidance of the formal learning objective, we can efficiently learn index by incorporating the proposed sampling technique, and learn precise index with enhanced generalization ability brought by the proposed result-driven gap insertion technique. We conduct extensive experiments on real-world datasets and compare several indexing methods from the perspective of the index learning objective. The results show the ability of the proposed framework to help to design suitable indexes for different scenarios. Further, we demonstrate the effectiveness of the proposed sampling technique, which achieves up to 78x construction speedup while maintaining non-degraded indexing performance. Finally, we show the gap insertion technique can enhance both the static and dynamic indexing performances of existing learned index methods with up to 1.59x query speedup. We will release our codes and processed data for further study, which can enable more exploration of learned indexes from both the perspectives of machine learning and database.

中文翻译:

通过采样和间隙插入的可插拔学习索引方法

数据库索引促进了数据检索,并使实际系统中的广泛应用受益。最近,提出了一个新的索引系列,称为“学习索引”,以学习隐藏但有用的数据分布,并将此类信息合并到索引的学习中,从而带来了有希望的性能改进。但是,学习索引的“学习”过程仍未得到充分开发。在本文中,我们提出了一个基于机器学习的正式框架来量化索引学习目标,并研究两种通用和可插入的技术来提高学习索引的学习效率和学习效果。在正式学习目标的指导下,我们可以通过合并建议的采样技术来有效地学习索引,并学习由结果驱动的间隙插入技术带来的具有增强的泛化能力的精确索引。我们在现实世界的数据集上进行了广泛的实验,并从索引学习目标的角度比较了几种索引方法。结果表明,所提出的框架能够帮助设计适合不同情况的索引。此外,我们证明了所提出的采样技术的有效性,该技术在保持不降级的索引性能的同时,实现了高达78倍的构建速度。最后,我们展示了间隙插入技术可以提高现有学习索引方法的静态和动态索引性能,并提高1.59倍的查询速度。我们将发布代码和处理后的数据以供进一步研究,
更新日期:2021-01-05
down
wechat
bug