当前位置: X-MOL 学术J. ACM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Optimization of Conjunctive Queries
Journal of the ACM ( IF 2.3 ) Pub Date : 2020-10-29 , DOI: 10.1145/3424908
Pablo Barceló 1 , Diego Figueira 2 , Georg Gottlob 3 , Andreas Pieris 4
Affiliation  

This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHW k . A CQ is semantically in GHW k if it is equivalent to a CQ in GHW k . The problem of checking whether a CQ is semantically in GHW k has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHW k under the constraints, while not being semantically in GHW k without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHW k , for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHW k over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW 1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHW k alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHW k , we discuss how it can be approximated via a CQ that falls in GHW k in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable.

中文翻译:

连接查询的语义优化

这项工作处理连接查询(CQ)的中心类的语义优化问题。由于 CQ 评估是 NP 完全的,因此一长串研究集中在识别可以有效评估的 CQ 片段上。最普遍的限制之一对应于由固定常数限制的广义超树宽度ķ≥1;相关片段表示为 GHW ķ . CQ 在语义上是 GHW ķ 如果它相当于 GHW 中的 CQ ķ . GHW中检查CQ是否语义的问题 ķ 已经在无约束情况下进行了研究,并且已经证明它是 NP 完全的。然而,如果数据库受到约束,例如可以表达例如包含依赖关系的元组生成依赖关系 (TGD),或捕获例如关键依赖关系的相等生成依赖关系 (EGD),CQ 可能会变成在 GHW 中语义化 ķ 在约束下,而不是在 GHW 中的语义 ķ 没有约束。这为新的查询优化技术开辟了道路。在本文中,我们发起并发展了约束条件下 CQ 的语义优化理论。更准确地说,我们研究了以下自然问题:给定一个 CQ 和一组约束,在 GHW 中是语义上的查询吗? ķ , 对于固定ķ≥ 1,在约束下,或者换句话说,查询是否等同于属于 GHW 的查询 ķ 在所有满足约束的数据库中?我们表明,与人们可能预期的相反,CQ 遏制的可判定性是所讨论问题的可判定性的必要条件,但不是充分条件。特别是,我们展示了在 GHW 中检查 CQ 是否在语义上1在存在完整的 TGD(即 Datalog 规则)或 EGD 的情况下是不可判定的。鉴于上述负面结果,我们专注于 CQ 包含可确定的 TGD 的主要类别,并且不捕获完整 TGD 的类别,即受保护的、非递归的和粘性的 TGD 集,并表明所讨论的问题是可判定的,而其复杂性与 CQ 遏制的复杂性相吻合。我们还考虑了对一元和二元关系的关键依赖关系,并且我们表明所讨论的问题在基本时间是可确定的。此外,我们调查在 GHW 中是否在语义上 ķ 减轻查询评估的成本。最后,如果 CQ 在 GHW 中没有语义 ķ ,我们讨论如何通过属于 GHW 的 CQ 来近似它 ķ 以最佳方式。当精确评估难以处理时,这种近似可能有助于找到输入查询的“快速”答案。
更新日期:2020-10-29
down
wechat
bug