当前位置:
X-MOL 学术
›
arXiv.cs.DB
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Pluggable Learned Index Method via Sampling and Gap Insertion
arXiv - CS - Databases Pub Date : 2021-01-04 , DOI: arxiv-2101.00808 Yaliang Li, Daoyuan Chen, Bolin Ding, Kai Zeng, Jingren Zhou
arXiv - CS - Databases Pub Date : 2021-01-04 , DOI: arxiv-2101.00808 Yaliang Li, Daoyuan Chen, Bolin Ding, Kai Zeng, Jingren Zhou
Database indexes facilitate data retrieval and benefit broad applications in
real-world systems. Recently, a new family of index, named learned index, is
proposed to learn hidden yet useful data distribution and incorporate such
information into the learning of indexes, which leads to promising performance
improvements. However, the "learning" process of learned indexes is still
under-explored. In this paper, we propose a formal machine learning based
framework to quantify the index learning objective, and study two general and
pluggable techniques to enhance the learning efficiency and learning
effectiveness for learned indexes. With the guidance of the formal learning
objective, we can efficiently learn index by incorporating the proposed
sampling technique, and learn precise index with enhanced generalization
ability brought by the proposed result-driven gap insertion technique. We conduct extensive experiments on real-world datasets and compare several
indexing methods from the perspective of the index learning objective. The
results show the ability of the proposed framework to help to design suitable
indexes for different scenarios. Further, we demonstrate the effectiveness of
the proposed sampling technique, which achieves up to 78x construction speedup
while maintaining non-degraded indexing performance. Finally, we show the gap
insertion technique can enhance both the static and dynamic indexing
performances of existing learned index methods with up to 1.59x query speedup.
We will release our codes and processed data for further study, which can
enable more exploration of learned indexes from both the perspectives of
machine learning and database.
中文翻译:
通过采样和间隙插入的可插拔学习索引方法
数据库索引促进了数据检索,并使实际系统中的广泛应用受益。最近,提出了一个新的索引系列,称为“学习索引”,以学习隐藏但有用的数据分布,并将此类信息合并到索引的学习中,从而带来了有希望的性能改进。但是,学习索引的“学习”过程仍未得到充分开发。在本文中,我们提出了一个基于机器学习的正式框架来量化索引学习目标,并研究两种通用和可插入的技术来提高学习索引的学习效率和学习效果。在正式学习目标的指导下,我们可以通过合并建议的采样技术来有效地学习索引,并学习由结果驱动的间隙插入技术带来的具有增强的泛化能力的精确索引。我们在现实世界的数据集上进行了广泛的实验,并从索引学习目标的角度比较了几种索引方法。结果表明,所提出的框架能够帮助设计适合不同情况的索引。此外,我们证明了所提出的采样技术的有效性,该技术在保持不降级的索引性能的同时,实现了高达78倍的构建速度。最后,我们展示了间隙插入技术可以提高现有学习索引方法的静态和动态索引性能,并提高1.59倍的查询速度。我们将发布代码和处理后的数据以供进一步研究,
更新日期:2021-01-05
中文翻译:
通过采样和间隙插入的可插拔学习索引方法
数据库索引促进了数据检索,并使实际系统中的广泛应用受益。最近,提出了一个新的索引系列,称为“学习索引”,以学习隐藏但有用的数据分布,并将此类信息合并到索引的学习中,从而带来了有希望的性能改进。但是,学习索引的“学习”过程仍未得到充分开发。在本文中,我们提出了一个基于机器学习的正式框架来量化索引学习目标,并研究两种通用和可插入的技术来提高学习索引的学习效率和学习效果。在正式学习目标的指导下,我们可以通过合并建议的采样技术来有效地学习索引,并学习由结果驱动的间隙插入技术带来的具有增强的泛化能力的精确索引。我们在现实世界的数据集上进行了广泛的实验,并从索引学习目标的角度比较了几种索引方法。结果表明,所提出的框架能够帮助设计适合不同情况的索引。此外,我们证明了所提出的采样技术的有效性,该技术在保持不降级的索引性能的同时,实现了高达78倍的构建速度。最后,我们展示了间隙插入技术可以提高现有学习索引方法的静态和动态索引性能,并提高1.59倍的查询速度。我们将发布代码和处理后的数据以供进一步研究,