BayesCard: A Unified Bayesian Framework for Cardinality Estimation,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

BayesCard: A Unified Bayesian Framework for Cardinality Estimation
arXiv - CS - Databases Pub Date : 2020-12-29 , DOI: arxiv-2012.14743
Ziniu Wu, Amir Shaikhha

Cardinality estimation is one of the fundamental problems in database management systems and it is an essential component in query optimizers. Traditional machine-learning-based approaches use probabilistic models such as Bayesian Networks (BNs) to learn joint distributions on data. Recent research advocates for using deep unsupervised learning and achieves state-of-the-art performance in estimating the cardinality of selection and join queries. Yet the lack of scalability, stability and interpretability of such deep learning models, makes them unsuitable for real-world databases. Recent advances in probabilistic programming languages (PPLs) allow for a declarative and efficient specification of probabilistic models such as BNs, and achieve state-of-the-art accuracy in various machine learning tasks. In this paper, we present BayesCard, the first framework incorporating the techniques behind PPLs for building BNs along with relational extensions that can accurately estimate the cardinality of selection and join queries in database systems with model sizes that are up to three orders of magnitude smaller than deep models'. Furthermore, the more stable performance and better interpretation of BNs make them viable options for practical query optimizers. Our experimental results on several single-relation and multi-relation databases indicate that BayesCard with a reasonable estimation time has a better estimation accuracy than deep learning models, and has from one to two orders of magnitude less training cost nevertheless.

中文翻译：

BayesCard：基数估计的统一贝叶斯框架

基数估计是数据库管理系统中的基本问题之一，并且是查询优化器中的重要组成部分。传统的基于机器学习的方法使用诸如贝叶斯网络（BN）之类的概率模型来学习数据的联合分布。最近的研究提倡使用深度无监督学习，并在估计选择和联接查询的基数方面达到最先进的性能。但是，此类深度学习模型缺乏可伸缩性，稳定性和可解释性，这使其不适用于现实世界的数据库。概率编程语言（PPL）的最新进展允许对诸如BN之类的概率模型进行声明式和有效的规范，并在各种机器学习任务中实现最先进的准确性。在本文中，我们提出了BayesCard，第一个框架结合了用于构建BN的PPL背后的技术，以及可以精确估计选择和连接查询的关系扩展的数据库系统，该数据库系统的模型大小比深度模型小三倍。此外，BN的更稳定的性能和更好的解释使其成为实用查询优化器的可行选择。我们在几个单关系和多关系数据库上的实验结果表明，具有合理估计时间的BayesCard的估计准确性比深度学习模型更好，并且训练成本却降低了一个到两个数量级。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文