Leveraging Soft Functional Dependencies for Indexing Multi-dimensional Data,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging Soft Functional Dependencies for Indexing Multi-dimensional Data
arXiv - CS - Databases Pub Date : 2020-06-29 , DOI: arxiv-2006.16393
Behzad Ghaffari, Ali Hadian, Thomas Heinis

A new proposal in database indexing has been for index structures to automatically learn and use the distribution of the underlying data to improve their performance. Initial work on \textit{learned indexes} has repeatedly shown that by learning the distribution of the data, index structures such as the B-Tree, can boost their performance by an order of magnitude while using a smaller memory footprint. In this work we propose a new class of learned indexes for multidimensional data that instead of learning only from distribution of keys, learns from correlations between columns of the dataset. Our approach is motivated by the observation that in real datasets, correlation between two or more attributes of the data is a common occurrence. This idea of learning from functional dependencies has been previously explored and implemented in many state of the art query optimisers to predict selectivity of queries and come up with better query plans. In this project we aim to take the use of learned functional dependencies a step further in databases. Consequently, we focus on using learned functional dependencies to reduce the dimensionality of datasets. With this we attempt to work around the curse of dimensionality - which in the context of spatial data stipulates that with every additional dimension, the performance of an index deteriorates further - to accelerate query execution. In more precise terms, we learn how to infer one (or multiple) attributes from the remaining attributes and hence no longer need to index predicted columns. This method reduces the dimensionality of the index and thus makes it more efficient. We show experimentally that by predicting correlated attributes in the data, rather than indexing them, we can improve the query execution time and reduce the memory overhead of the index at the same time.

中文翻译：

利用软功能依赖来索引多维数据

数据库索引中的一项新提议是索引结构可以自动学习并使用底层数据的分布来提高其性能。\textit{学习索引} 的初步工作反复表明，通过学习数据的分布，索引结构（如 B 树）可以将其性能提高一个数量级，同时使用更小的内存占用。在这项工作中，我们为多维数据提出了一类新的学习索引，它不是仅从键的分布中学习，而是从数据集列之间的相关性中学习。我们的方法受到以下观察的启发：在实际数据集中，数据的两个或多个属性之间的相关性很常见。这种从函数依赖中学习的想法之前已经在许多最先进的查询优化器中进行了探索和实现，以预测查询的选择性并提出更好的查询计划。在这个项目中，我们的目标是在数据库中进一步使用学习到的函数依赖。因此，我们专注于使用学习到的函数依赖来降低数据集的维数。有了这个，我们试图解决维度灾难——在空间数据的上下文中，它规定每增加一个维度，索引的性能就会进一步恶化——以加速查询执行。更准确地说，我们学习如何从剩余的属性中推断一个（或多个）属性，因此不再需要索引预测列。这种方法降低了索引的维数，从而使其更有效。我们通过实验表明，通过预测数据中的相关属性，而不是索引它们，我们可以提高查询执行时间并同时减少索引的内存开销。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文