当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximate Query Processing for Group-By Queries based on Conditional Generative Models
arXiv - CS - Databases Pub Date : 2021-01-08 , DOI: arxiv-2101.02914
Meifan Zhang, Hongzhi Wang

The Group-By query is an important kind of query, which is common and widely used in data warehouses, data analytics, and data visualization. Approximate query processing is an effective way to increase the querying efficiency on big data. The answer to a group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups. Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries. Online sampling chooses samples for the given query at query time, but it requires a long latency. Thus, it is a challenge to achieve both accuracy and efficiency at the same time. Facing such challenge, in this work, we propose a sample generation framework based on a conditional generative model. The sample generation framework can generate any number of samples for the given query without accessing the data. The proposed framework based on the lightweight model can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries. The experimental results show that our proposed methods are both efficient and accurate.

中文翻译:

基于条件生成模型的按组查询的近似查询处理

“按组分组”查询是一种重要的查询,在数据仓库,数据分析和数据可视化中很常见,并且广泛使用。近似查询处理是提高大数据查询效率的有效方法。组查询的答案涉及多个值,这使得很难为所有组提供足够准确的估计。与统一采样相比,分层采样可以提高准确性,但是为某些特殊查询选择的样本不能用于其他查询。在线采样会在查询时为给定查询选择样本,但是这需要较长的等待时间。因此,同时实现精度和效率是一个挑战。面对这样的挑战,在这项工作中,我们提出了一个基于条件生成模型的样本生成框架。样本生成框架可以为给定查询生成任意数量的样本,而无需访问数据。所提出的基于轻量级模型的框架可以与分层抽样和在线聚合相结合,以提高分组查询的估计准确性。实验结果表明,我们提出的方法既高效又准确。
更新日期:2021-01-11
down
wechat
bug