Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks
arXiv - CS - Databases Pub Date : 2020-09-21 , DOI: arxiv-2009.09883
Max Halford and Philippe Saint-Pierre and Franck Morvan

Relational query optimisers rely on cost models to choose between different query execution plans. Selectivity estimates are known to be a crucial input to the cost model. In practice, standard selectivity estimation procedures are prone to large errors. This is mostly because they rely on the so-called attribute value independence and join uniformity assumptions. Therefore, multidimensional methods have been proposed to capture dependencies between two or more attributes both within and across relations. However, these methods require a large computational cost which makes them unusable in practice. We propose a method based on Bayesian networks that is able to capture cross-relation attribute value dependencies with little overhead. Our proposal is based on the assumption that dependencies between attributes are preserved when joins are involved. Furthermore, we introduce a parameter for trading between estimation accuracy and computational cost. We validate our work by comparing it with other relevant methods on a large workload derived from the JOB and TPC-DS benchmarks. Our results show that our method is an order of magnitude more efficient than existing methods, whilst maintaining a high level of accuracy.

中文翻译：

使用关联贝叶斯网络的具有属性值依赖性的选择性估计

关系查询优化器依靠成本模型在不同的查询执行计划之间进行选择。众所周知，选择性估计是成本模型的关键输入。在实践中，标准选择性估计程序容易出现大错误。这主要是因为它们依赖于所谓的属性值独立性和连接一致性假设。因此，已经提出了多维方法来捕获关系内和关系间的两个或多个属性之间的依赖关系。然而，这些方法需要大量的计算成本，这使得它们在实践中无法使用。我们提出了一种基于贝叶斯网络的方法，该方法能够以很少的开销捕获交叉关系属性值依赖关系。我们的提议基于这样一个假设，即在涉及连接时保留属性之间的依赖关系。此外，我们引入了一个用于在估计精度和计算成本之间进行权衡的参数。我们通过在源自 JOB 和 TPC-DS 基准的大量工作负载上将其与其他相关方法进行比较来验证我们的工作。我们的结果表明，我们的方法比现有方法效率高一个数量级，同时保持高水平的准确性。

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>