Consistent Query Answering for Primary Keys in Datalog

Koutris, Paraschos; Wijsen, Jef

doi:10.1007/s00224-020-09985-6

Consistent Query Answering for Primary Keys in Datalog

Published: 30 June 2020

Volume 65, pages 122–178, (2021)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Paraschos Koutris¹ &
Jef Wijsen²

307 Accesses
5 Citations
Explore all metrics

Abstract

We study the complexity of consistent query answering on databases that may violate primary key constraints. A repair of such a database is any consistent database that can be obtained by deleting a minimal set of tuples. For every Boolean query q, CERTAINTY(q) is the problem that takes a database as input and asks whether q evaluates to true on every repair. In Koutris and Wijsen (ACM Trans. Database Syst. 42(2), 9:1–9:45, 2017), the authors show that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either in P or coNP-complete, and it is decidable which of the two cases applies. In this article, we sharpen this result by showing that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either expressible in symmetric stratified Datalog (with some aggregation operator) or coNP-complete. Since symmetric stratified Datalog is in L, we thus obtain a complexity-theoretic dichotomy between L and coNP-complete. Another new finding of practical importance is that CERTAINTY(q) is on the logspace side of the dichotomy for queries q where all join conditions express foreign-to-primary key matches, which is undoubtedly the most common type of join condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Survey of the Data Complexity of Consistent Query Answering under Key Constraints

On the Data Complexity of Consistent Query Answering

Article 04 November 2014

Balder ten Cate, Gaëlle Fontaine & Phokion G. Kolaitis

Automated Reasoning About Key Sets

Notes

The quotient graph of a directed graph G = (V,E) with respect to an equivalence relation ≡ on V is a directed graph whose vertices are the equivalence classes of ≡; there is a directed edge from class A to class B if E has a directed edge from some vertex in A to some vertex in B.
Here, α[Z ∪{w}] is the restriction of α to Z ∪{w}.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995). http://webdam.inria.fr/Alice/
MATH Google Scholar
Arenas, M., Bertossi, L. E., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM PODS, pp. 68–79. https://doi.org/10.1145/303976.303983 (1999)
Arenas, M., Bertossi, L. E., Chomicki, J., He, X., Raghavan, V., Spinrad, J. P.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296(3), 405–434 (2003). https://doi.org/10.1016/S0304-3975(02)00737-5
Article MathSciNet MATH Google Scholar
Aspvall, B., Plass, M. F., Tarjan, R. E.: A linear-time algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett. 8 (3), 121–123 (1979). https://doi.org/10.1016/0020-0190(79)90002-4
Article MathSciNet MATH Google Scholar
Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An introduction to description logic. Cambridge University Press, Cambridge (2017). http://www.cambridge.org/de/academic/subjects/computer-science/knowledge-management-databases-and-data-mining/introduction-description-logic?format=PB#17zVGeWD2TZUeu6s.97
Book Google Scholar
Barceló, P., Fontaine, G.: On the data complexity of consistent query answering over graph databases. J. Comput. Syst. Sci. 88, 164–194 (2017). https://doi.org/10.1016/j.jcss.2017.03.015
Article MathSciNet MATH Google Scholar
Bertossi, L. E.: Database repairing and consistent query answering. Synthesis lectures on data management. Morgan & Claypool Publishers, San Rafael (2011)
Google Scholar
Bertossi, L. E.: Database repairs and consistent query answering: Origins and further developments. In: Suciu, D., Skritek, S., Koch, C. (eds.) Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. https://doi.org/10.1145/3294052.3322190, pp 48–58. ACM (2019)
Bienvenu, M., Bourgaux, C.: Inconsistency-tolerant querying of description logic knowledge bases. In: Pan, J.Z., Calvanese, D., Eiter, T., Horrocks, I., Kifer, M., Lin, F., Zhao, Y. (eds.) Reasoning Web: Logical foundation of knowledge graph construction and query answering - 12th International Summer School 2016, Aberdeen, UK, September 5-9, 2016, Tutorial lectures, Lecture notes in computer science. https://doi.org/10.1007/978-3-319-49493-7_5, vol. 9885, pp 156–202. Springer (2016)
Bulatov, A. A.: Complexity of conservative constraint satisfaction problems. ACM Trans. Comput. Log. 12(4), 24:1–24:66 (2011). https://doi.org/10.1145/1970398.1970400
Article MathSciNet MATH Google Scholar
Dixit, A. A., Kolaitis, P. G.: A SAT-based system for consistent query answering. In: Janota, M., Lynce, I. (eds.) Theory and Applications of Satisfiability Testing - SAT 2019 - 22nd International Conference, SAT 2019, Lisbon, Portugal, July 9-12, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11628, pp 117–135. Springer (2019), https://doi.org/10.1007/978-3-030-24258-9_8
Egri, L., Larose, B., Tesson, P.: Symmetric Datalog and constraint satisfaction problems in Logspace. In: LICS, pp. 193–202. https://doi.org/10.1109/LICS.2007.47 (2007)
Fontaine, G.: Why is it hard to obtain a dichotomy for consistent query answering? ACM Trans. Comput. Log. 16 (1), 7:1–7:24 (2015). https://doi.org/10.1145/2699912
Article MathSciNet MATH Google Scholar
Fuxman, A., Miller, R. J.: First-order query rewriting for inconsistent databases. In: ICDT, pp 337–351 (2005), https://doi.org/10.1007/978-3-540-30570-5_23
Fuxman, A., Miller, R. J.: First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci. 73(4), 610–635 (2007). https://doi.org/10.1016/j.jcss.2006.10.013
Article MathSciNet MATH Google Scholar
Grädel, E., Kolaitis, P. G., Libkin, L., Marx, M., Spencer, J., Vardi, M. Y., Venema, Y., Weinstein, S.: Finite model theory and its applications. Texts in theoretical computer science. An EATCS series springer. https://doi.org/10.1007/3-540-68804-8 (2007)
Greco, S., Pijcke, F., Wijsen, J.: Certain query answering in partially consistent databases. PVLDB 7(5), 353–364 (2014). http://www.vldb.org/pvldb/vol7/p353-greco.pdf
Google Scholar
Grohe, M., Schwentick, T.: Locality of order-invariant first-order formulas. ACM Trans. Comput. Log. 1(1), 112–130 (2000). https://doi.org/10.1145/343369.343386
Article MathSciNet MATH Google Scholar
Kolaitis, P.G., Pema, E., Tan, W.: Efficient querying of inconsistent databases with binary integer programming. PVLDB 6(6), 397–408 (2013). http://www.vldb.org/pvldb/vol6/p397-tan.pdf
Google Scholar
Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS. https://doi.org/10.1145/2745754.2745769, pp 17–29 (2015)
Koutris, P., Wijsen, J.: Consistent query answering for self-join-free conjunctive queries under primary key constraints. ACM Trans. Database Syst. 42 (2), 9:1–9:45 (2017). https://doi.org/10.1145/3068334
Article MathSciNet Google Scholar
Koutris, P., Wijsen, J.: Consistent query answering for primary keys and conjunctive queries with negated atoms. In: PODS, pp 209–224 (2018), https://doi.org/10.1145/3196959.3196982
Koutris, P., Wijsen, J.: Consistent query answering for primary keys in logspace. In: Barceló, P., Calautti, M. (eds.) 22nd International Conference on Database Theory, ICDT 2019, March 26-28, 2019, Lisbon, Portugal, LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, vol. 127, pp 23:1–23:19 (2019), https://doi.org/10.4230/LIPIcs.ICDT.2019.23
Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D. F.: Inconsistency-tolerant query answering in ontology-based data access. J. Web Sem. 33, 3–29 (2015). https://doi.org/10.1016/j.websem.2015.04.002
Article Google Scholar
Libkin, L.: Elements of finite model theory. Texts in theoretical computer science. An EATCS series springer. https://doi.org/10.1007/978-3-662-07003-1 (2004)
Lincoln, A., Williams, V. V., Williams, R. R.: Tight hardness for shortest cycles and paths in sparse graphs. In: ACM-SIAM SODA. https://doi.org/10.1137/1.9781611975031.80, pp 1236–1252 (2018)
Lutz, C., Wolter, F.: On the relationship between consistent query answering and constraint satisfaction problems. In: ICDT. https://doi.org/10.4230/LIPIcs.ICDT.2015.363, pp 363–379 (2015)
Marileo, M. C., Bertossi, L. E.: The consistency extractor system: Answer set programs for consistent query answering in databases. Data Knowl. Eng. 69(6), 545–572 (2010). https://doi.org/10.1016/j.datak.2010.01.005
Article Google Scholar
Maslowski, D., Wijsen, J.: A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci. 79(6), 958–983 (2013). https://doi.org/10.1016/j.jcss.2013.01.011
Article MathSciNet MATH Google Scholar
Maslowski, D., Wijsen, J.: Counting database repairs that satisfy conjunctive queries with self-joins. In: ICDT, pp 155–164 (2014), https://doi.org/10.5441/002/icdt.2014.18
Pijcke, F.: Theoretical and practical methods for consistent query answering in the relational data model. Ph.D. thesis, University of Mons (2018)
Przymus, P., Boniewicz, A., Burzanska, M., Stencel, K.: Recursive query facilities in relational databases: a survey. In: FGIT. https://doi.org/10.1007/978-3-642-17622-7_10, pp 89–99 (2010)
Reingold, O.: Undirected connectivity in log-space. J. ACM 55 (4), 17:1–17:24 (2008). https://doi.org/10.1145/1391289.1391291
Article MathSciNet MATH Google Scholar
Wijsen, J.: On the First-order expressibility of computing certain answers to conjunctive queries over uncertain databases. In: PODS. https://doi.org/10.1145/1807085.1807111, pp 179–190 (2010)
Wijsen, J.: Certain conjunctive query answering in first-order logic. ACM Trans. Database Syst. 37(2), 9:1–9:35 (2012). https://doi.org/10.1145/2188349.2188351
Article MathSciNet Google Scholar
Wijsen, J.: A survey of the data complexity of consistent query answering under key constraints. In: FoIKS. https://doi.org/10.1007/978-3-319-04939-7_2, pp 62–78 (2014)
Wijsen, J.: Foundations of query answering on inconsistent databases. SIGMOD Rec. 48(3), 6–16 (2019). https://doi.org/10.1145/3377391.3377393
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison, WI, USA
Paraschos Koutris
University of Mons, Mons, Belgium
Jef Wijsen

Authors

Paraschos Koutris
View author publications
You can also search for this author in PubMed Google Scholar
Jef Wijsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jef Wijsen.

Additional information

E : Proofs of Section 9

We will use the following helping lemma.

Lemma 19

Let q be a query in sjfBCQ that has the key-join property. Then, for all F,G ∈ q, if $F\overset {q}{\rightsquigarrow }G$, then there exists a sequence $F_{0},F_{1},\dots ,F_{\ell }$ such that F₀ = F, F_ℓ = G, and for all $i\in \{1,2,\dots ,\ell \}$, ${\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i-1}})$.

Proof

Assume $F\overset {q}{\rightsquigarrow }G$. We can assume a shortest sequence

$$ F_{0}\stackrel{x_{1}}{\smallfrown}F_{1}\stackrel{x_{2}}{\smallfrown}F_{2}\dotsm\stackrel{x_{\ell-1}}{\smallfrown}F_{\ell-1}\stackrel{x_{\ell}}{\smallfrown}F_{\ell} $$

(7)

that is a witness for $F\overset {q}{\rightsquigarrow }G$. Clearly, for all $i\in \{0,1,\dots ,\ell -1\}$, vars(F_i) ∩vars(F_i+ 1)≠∅. Then, since q has the key-join property, for all $i\in \{0,1,\dots ,\ell -1\}$, either

1.
vars(F_i) ∩vars(F_i+ 1) ∈{key(F_i),key(F_i+ 1)}, or
2.
${\mathsf {vars}}({F_{i}})\cap {\mathsf {vars}}({F_{i+1}})\supseteq {\mathsf {key}}({F_{i}})\cup {\mathsf {key}}({F_{i+1}})$.

We show by induction on increasing i that for all $i\in \{1,\dots ,\ell \}$, ${\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i-1}})$.Induction Basis i = 1 From $x_{1}\notin {F_{0}}^{+,{q}}$, it follows x₁∉key(F₀). It follows that vars(F₀) ∩vars(F₁)≠key(F₀). Consequently, vars(F₀) ∩vars(F₁) includes key(F₁).Induction Step $i\rightarrow i+1$ The induction hypothesis is that ${\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i-1}})$. Assume, towards a contradiction, vars(F_i) ∩vars(F_i+ 1) = key(F_i). It follows x_i+ 1 ∈vars(F_i− 1). Then the witness (7) can be shortened by replacing the subsequence $F_{i-1}\stackrel {x_{i}}{\smallfrown }F_{i}\stackrel {x_{i+1}}{\smallfrown }F_{i+1}$ with $F_{i-1}\stackrel {x_{i+1}}{\smallfrown }F_{i+1}$, contradicting our assumption that no witness for $F\overset {q}{\rightsquigarrow }G$ is shorter than (7). We conclude by contradiction that vars(F_i) ∩vars(F_i+ 1)≠key(F_i). Consequently, vars(F_i) ∩vars(F_i+ 1) includes key(F_i+ 1). □

The proof of Theorem 4 can now be given.

Proof Proof of Theorem 4

Assume that q has the key-join property We show that the attack graph of q contains no strong attacks. To this end, assume $F\stackrel {q}{\rightsquigarrow }G$. The sequence $F_{0},F_{1},\dots ,F_{\ell -1}$ in the statement of Lemma 19 is a sequential proof for ${\mathcal {K}}({q})\models {{\mathsf {key}}({F_{0}})}\rightarrow {{\mathsf {key}}({F_{\ell }})}$, and therefore the attack $F\overset {q}{\rightsquigarrow }G$ is weak. The result then follows from Theorem 3. □

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Database Theory (ICDT 2019)

Guest Editor: Pablo Baceló

This article extends an earlier, shorter version entitled “Consistent Query Answering for Primary Keys in Logspace” which was presented at the 22nd International Conference on Database Theory (ICDT 2019) [23] .

Appendices

Appendix A: Overview of Different Graphs and Notations

Graph	Vertices	Edge Notation	Short Description
attack graph	query atoms	$F\overset {q}{\rightsquigarrow }G$	See Section 3. Informally, $F\overset {q}{\rightsquigarrow }G$ means that there exists a “yes”-instance of CERTAINTY(q) in which two key-equal F-facts join with (and only with) two G-facts that are not key-equal (cf. [35, Proposition 6.4]).
M-graph	query atoms	F→ _MG	Definition 3. Informally, F→ _MG states that the functional dependency ${{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}$ is a logical consequence of the primary keys in atoms of mode c.
↪-graph	database facts	A↪B	Definition 4, data-level instantiation of the M-graph
↪ _C-graph	database facts	A↪ _CB	Definition 5, subgraph of the ↪-graph induced by an M-cycle C
block-quotient graph	database blocks	$({\mathbf {b}},{\mathbf {b}}^{\prime })$	Definition 6, quotient graph of the ↪ _C-graph relative to the equivalence relation “is key-equal to”

Notation	Meaning
key(F)	the set of all variables occurring in the primary key of atom F
vars(F)	the set of all variables occurring in atom F
vars(q)	the set of all variables occurring in query q
$\sim $	the equivalence relation “is key-equal to”, e.g., $R(\underline {a},1)\sim R(\underline {a},2)$
rset(db)	the set of all repairs of a database db
block(A,db)	the set of all facts in db that are key-equal to the fact A
$R(\underline {\vec {a}},\ast )$	the set of all database facts of the form $R(\underline {\vec {a}},\vec {b})$, for some $\vec {b}$
s j f B C Q	the class of self-join-free Boolean conjunctive queries
U C Q	the class of unions of conjunctive queries
R ^c	a relation name of mode c, which must be interpreted by a consistent relation
q ^cons	the set of all atoms of query q having a relation name of mode c
${\mathcal {K}}({q})$	the set containing ${{\mathsf {key}}({F})}\rightarrow {{\mathsf {vars}}({F})}$ for every F ∈ q
F ^+,q	the closure of key(F) with respect to the FDs in ${\mathcal {K}}({q\setminus \{F\}})\cup {\mathcal {K}}({{q}^{\mathsf {cons}}})$
genre_q(A)	the atom of q with the same relation name as the fact A
V (G)	the vertex set of a graph G
E(G)	the edge set of a graph G
⊎	a set union that happens to be disjoint

Appendix B: Proofs of Section 5

1.1 B.1 Proofs of Lemmas 1 and 2

Proof Proof of Lemma 1

Let o₁ and o₂ be garbage sets for q₀ in db. For every i ∈{1, 2}, we can assume a repair r_i of o_i such that

Garbage Condition: for every valuation 𝜃 over vars(q) such that $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{i}})\cup {\mathbf {r}}_{i}$, we have 𝜃(q₀) ∩r_i = ∅.

Let ${\mathbf {o}}_{2}^{-} = {\mathbf {o}}_{2}\setminus {\mathbf {o}}_{1}$ and ${\mathbf {r}}_{2}^{-} = {\mathbf {r}}_{2}\setminus {\mathbf {o}}_{1}$. Then, ${\mathbf {r}}_{1}\uplus {\mathbf {r}}_{2}^{-}$ is a repair of ${\mathbf {o}}_{1}\uplus {\mathbf {o}}_{2}^{-}$, where the use of ⊎ (rather than ∪) indicates that the operands of the union are disjoint. Let 𝜃 be an arbitrary valuation over vars(q) such that

$$\theta(q)\subseteq\left({\mathbf{db}\setminus({{\mathbf{o}}_{1}\uplus{\mathbf{o}}_{2}^{-}})}\right)\cup({{\mathbf{r}}_{1}\uplus{\mathbf{r}}_{2}^{-}}).$$

Then, $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{1}})\cup {\mathbf {r}}_{1}$. Consequently, by the Garbage Condition for i = 1, 𝜃(q₀) ∩r₁ = ∅, and therefore 𝜃(q₀) ∩o₁ = ∅. It follows $\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}_{1}\cup {\mathbf {o}}_{2}})}\right )\cup {\mathbf {r}}_{2}^{-}$, hence $\theta (q)\subseteq \left ({\mathbf {db}\setminus {\mathbf {o}}_{2}}\right )\cup {\mathbf {r}}_{2}^{-}$. Consequently, by the Garbage Condition for i = 2, $\theta (q_{0})\cap {\mathbf {r}}_{2}^{-}=\emptyset $. It follows that ${\mathbf {o}}_{1}\uplus {\mathbf {o}}_{2}^{-}$=o₁ ∪o₂ is a garbage set for q₀ in db. □

Proof Proof of Lemma 2

The ⇐=-direction is trivial. For the ⇒-direction, assume that every repair of db satisfies q. We can assume a repair r₀ of o such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}_{0}$, then 𝜃(q₀) ∩r₀ = ∅. Let r be an arbitrary repair of db ∖o. It suffices to show r⊧q. Since r ∪r₀ is a repair of db, we can assume a valuation 𝜃 over vars(q) such that $\theta (q)\subseteq {\mathbf {r}}\cup {\mathbf {r}}_{0}$. Since $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}_{0}$ is obvious, it follows 𝜃(q) ∩r₀ = ∅. Consequently, $\theta (q)\subseteq {\mathbf {r}}$, hence r⊧q. This concludes the proof. □

1.2 B.2 Proof of Lemma 3

We will use two helping lemmas.

Lemma 13

Let q be a query in sjfBCQ, and let $q_{0}\subseteq q$. Let o be a garbage set for q₀ in db. If p is the union of one or more blocks of o, then o ∖p is a garbage set for q₀ in db ∖p.

Proof

Let p be the union of one or more blocks of o. We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, then 𝜃(q) ∩r = ∅. Let s = r ∖p. Obviously, s is a repair of o ∖p.

Let 𝜃 be a valuation over vars(q) such that $\theta (q)\subseteq \left ({({\mathbf {db}\setminus {\mathbf {p}}})\setminus ({{\mathbf {o}}\setminus {\mathbf {p}}})}\right )\cup {\mathbf {s}}$. It suffices to show 𝜃(q) ∩s = ∅. Since $\left ({\mathbf {db}\setminus {\mathbf {p}}}\right )\setminus \left ({{\mathbf {o}}\setminus {\mathbf {p}}}\right )\subseteq \mathbf {db}\setminus {\mathbf {o}}$ and ${\mathbf {s}}\subseteq {\mathbf {r}}$, it follows $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, hence 𝜃(q) ∩r = ∅. It follows 𝜃(q) ∩s = ∅. □

Corollary 1

Let q be a query in sjfBCQ, and let $q_{0}\subseteq q$. Let o be a garbage set for q₀ in db. If every garbage set for q₀ in db ∖o is empty, then o is the maximum garbage set for q₀ in db.

Proof

Proof by contraposition. Assume that o is not the maximum garbage set for q₀ in db. Let o₀ be the maximum garbage set for q₀ in db. By Lemma 13, o₀ ∖o is a nonempty garbage set for q₀ in db ∖o. □

Lemma 14

Let q be a query in sjfBCQ, and let $q_{0}\subseteq q$. Let db be a database. If o is a garbage set for q₀ in db, and p is a garbage set for q₀ in db ∖o, then o ∪p is a garbage set for q₀ in db.

Proof

Assume the hypothesis holds. Note that o ∩p = ∅. We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, then 𝜃(q) ∩r = ∅. Likewise, we can assume a repair s of p such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq \left ({({\mathbf {db}\setminus {\mathbf {o}}})\setminus {\mathbf {p}}}\right )\cup {\mathbf {s}}$, then 𝜃(q) ∩s = ∅. Obviously, r ∪s is a repair of o ∪p.

Let 𝜃 be a valuation over vars(q) such that $\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup ({{\mathbf {r}}\cup {\mathbf {s}}})$. From the set inclusion $\left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup ({{\mathbf {r}}\cup {\mathbf {s}}}) \subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, it follows $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, hence 𝜃(q) ∩r = ∅. Then, $\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup {\mathbf {s}} = \left ({({\mathbf {db}\setminus {\mathbf {o}}})\setminus {\mathbf {p}}}\right )\cup {\mathbf {s}}$, hence 𝜃(q) ∩s = ∅. It follows 𝜃(q) ∩ (r ∪s) = ∅. □

Corollary 2

Let q be a query in sjfBCQ, and let $q_{0}\subseteq q$. Let db be a database, and let o be the maximum garbage set for q₀ in db. Then, every garbage set for q₀ in db ∖o is empty.

Proof

Immediate from Lemma 14. □

The proof of Lemma 3 can now be given.

Proof Proof of Lemma 3

Immediate from Corollaries 1 and 2. □

Appendix C: Appendix to Section 7

1.1 C.1 Proofs of Lemmas 5 and 6

Proof Proof of Lemma 5

We will write ⊕ for addition modulo k. We first consider garbage sets respecting the first three conditions.

Let A be a fact of db such that ${\mathsf {genre}}_{q}({A})\in \{F_{0},\dots ,F_{k-1}\}$ and A has zero outdegree in the ↪ _C-graph. Then, there exists no valuation 𝜃 over vars(q) such that $A\in \theta (q)\subseteq \mathbf {db}$. It is obvious that block(A,db) is a garbage set for C in db.
Let $A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }A_{k-1}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}$ be an irrelevant 1-embedding of C in db. Assume without loss of generality that for every $i\in \{0,\dots ,k-1\}$, genre_q(A_i) = F_i. Let ${\mathbf {o}}=\bigcup _{i=0}^{k-1}{\mathsf {block}}({A_{i}},{\mathbf {db}})$. Let ${\mathbf {r}}=\{A_{0},\dots ,A_{k-1}\}$, which is obviously a repair of o. We show that o is a garbage set for C in db. Assume, toward a contradiction, the existence of a valuation 𝜃 over vars(q) such that for some $i\in \{0,\dots ,k-1\}$, $A_{i}\in \theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}$. Then, 𝜃(F_i)↪ _C𝜃(F_i⊕1). Since 𝜃(F_i) = A_i, we have A_i↪ _C𝜃(F_i⊕1). From A_i↪ _C𝜃(F_i⊕1) and A_i↪ _CA_i⊕1, it follows $\theta (F_{i\oplus 1})\sim A_{i\oplus 1}$ by Lemma 4. Since 𝜃(F_i⊕1) ∈ (db ∖o) ∪r, it follows 𝜃(F_i⊕1) = A_i⊕1. By repeated application of the same reasoning, for every $j\in \{0,\dots ,k-1\}$, 𝜃(F_j) = A_j. But then $A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }A_{k-1}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}$ is a relevant 1-embedding of C in db, a contradiction.
Let r be a set containing all (and only) the facts of some n-embedding of C in db with n ≥ 2. Let ${\mathbf {o}}=\bigcup _{A\in {\mathbf {r}}}{\mathsf {block}}({A},{\mathbf {db}})$. It can be shown that o is a garbage set for C in db; the argumentation is analogous to the reasoning in the previous paragraph.

Let o₀ be the minimal subset of db that satisfies all conditions in the statement of the lemma except the recursive Condition 4. By Lemma 1 and our reasoning in the previous items, it follows that o₀ is a garbage set for C in db.

Note that the first three conditions do not recursively depend on o₀. Starting with o₀, construct a maximal sequence

$${\mathbf{o}}_{0},\mu_{0},{\mathbf{o}}_{1},\mu_{1},{\mathbf{o}}_{2},\mu_{2},\dots,{\mathbf{o}}_{m},\mu_{m},{\mathbf{o}}_{m+1}$$

such that ${\mathbf {o}}_{0}\subsetneq {\mathbf {o}}_{1}\subsetneq {\mathbf {o}}_{2}\subsetneq \dotsm \subsetneq {\mathbf {o}}_{m+1}$ and for every $h\in \{0,1,\dots ,m\}$,

1.
μ_h is a valuation over vars(q) such that $\mu _{h}(q)\subseteq \mathbf {db}$ and μ_h(q) ∩o_h≠∅. Therefore, $\mu (F_{0})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{1})\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{k-1})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{0})$ is a relevant 1-embedding of C in db; and
2.
${\mathbf {o}}_{h+1}={\mathbf {o}}_{h}\cup \left ({\bigcup _{i=0}^{k-1}{\mathsf {block}}({\mu _{h}(F_{i})},{\mathbf {db}})}\right )$.

It is clear that the final set o_m+ 1 is a minimal set satisfying all conditions in the statement of the lemma. We show by induction on increasing h that for all $h\in \{0,1,\dots ,m,m+1\}$, o_h is a garbage set for C in db. We have already showed that o₀ is a garbage set for C in db. For the induction step, $h\rightarrow h+1$, the induction hypothesis is that o_h is a garbage set for C in db. Then, there exists a repair r of o_h such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}$, then 𝜃(q) ∩r = ∅. For every $i\in \{0,\dots ,k-1\}$, define A_i := μ_h(F_i). Let ${\mathbf {s}}=\{A_{0},\dots ,A_{k-1}\}\setminus {\mathbf {o}}_{h}$. We have ${\mathbf {o}}_{h+1}={\mathbf {o}}_{h}\uplus \left ({\bigcup _{A_{j}\in {\mathbf {s}}}{\mathsf {block}}({A_{j}},{\mathbf {db}})}\right )$. Let ${\mathbf {r}}^{\prime }={\mathbf {r}}\uplus {\mathbf {s}}$. Obviously, ${\mathbf {r}}^{\prime }$ is a repair of o_h+ 1. Here, we use ⊎, rather than ∪, to make clear that the operands of the union are disjoint. Assume, toward a contradiction, the existence of a valuation 𝜃 over vars(q) such that $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }$ and $\theta (q)\cap {\mathbf {r}}^{\prime }\neq \emptyset $. Since $({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}$, it follows $\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}$, hence 𝜃(q) ∩r = ∅ by our initial hypothesis. It must be the case that 𝜃(q) ∩s≠∅. We can assume $i\in \{0,\dots ,k-1\}$ such that A_i ∈ 𝜃(q) ∩s. We have 𝜃(F_i)↪ _C𝜃(F_i⊕1). Since 𝜃(F_i) = A_i, we have A_i↪ _C𝜃(F_i⊕1). From A_i↪ _C𝜃(F_i⊕1) and A_i↪ _CA_i⊕1, it follows $\theta (F_{i\oplus 1})\sim A_{i\oplus 1}$ by Lemma 4. Therefore, 𝜃(F_i⊕1) ∈block(A_i⊕1,db). Two cases are possible:

Case that ${\mathsf {block}}({A_{i\oplus 1}}, \mathbf {db})\subseteq {\mathbf {o}}_{h}$.:: Since 𝜃(F_i⊕1) ∈ (db ∖o_h) ∪r, it must be the case that 𝜃(F_i⊕1) ∈r. However, since we have previously argued that 𝜃(q) ∩r = ∅, we conclude that this case cannot occur.
Case that ${\mathsf {block}}({A_{i\oplus 1}},{\mathbf {db}})\not \subseteq {\mathbf {o}}_{h}$.:: By our definition of s, we have A_i⊕1 ∈s. Since $\theta (F_{i\oplus 1})\in ({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }$, it must be the case that 𝜃(F_i⊕1) ∈s, and therefore 𝜃(F_i⊕1) = A_i⊕1.

From the above cases, it follows that A_i⊕1 ∈ 𝜃(q) ∩s. By repeating the same reasoning, we obtain that A_j ∈ 𝜃(q) ∩s for all $j\in \{0,\dots ,k-1\}$. Since μ_h(q) ∩o_h≠∅ by our construction, we can assume the existence of $\ell \in \{0,\dots ,k-1\}$ such that A_ℓ ∈o_h, hence A_ℓ∉s, which contradicts our earlier finding that each A_j belongs to 𝜃(q) ∩s. This concludes the induction step. It is correct to conclude that o_m+ 1 is a garbage set for C in db.

Let $\mathbf {db}^{\prime }=\mathbf {db}\setminus {\mathbf {o}}_{m+1}$. We show that the garbage set for C in $\mathbf {db}^{\prime }$ is empty. Assume, toward a contradiction, that o is a nonempty garbage set for C in $\mathbf {db}^{\prime }$. We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}$, then 𝜃(q) ∩r = ∅.

We show that for any A ∈r, the ↪ _C-graph contains an infinite path that starts from A such that any vertex on the path belongs to $({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}$ and any (contiguous) subpath of length k contains some fact from r. To this end, let A be a fact of r. By our construction, there exists a valuation μ over vars(q) such that $A\in \mu (q)\subseteq \mathbf {db}^{\prime }$ (otherwise A would belong to o_m+ 1). Hence, $\mu (F_{0})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{1})\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{k-1})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{0})$ is a relevant 1-embedding of C in $\mathbf {db}^{\prime }$ that contains A. Then, for some $i\in \{0,\dots ,k-1\}$, it must be the case that $\mu (F_{i})\not \in ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}$ (or else $\mu (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}$ and μ(q) ∩r≠∅, a contradiction). Therefore, the ↪ _C-graph contains a shortest path π of length < k from A to some fact B ∈o ∖r. Then, there exists $B^{\prime }\in {\mathbf {r}}$ such that $B^{\prime }\sim B$ and the ↪ _C-graph contains a path of length < k from A to $B^{\prime }$. This path is obtained by substituting $B^{\prime }$ for B in π. Since $B^{\prime }\in {\mathbf {r}}$, we can continue the path by applying the same reasoning as for A. The path is illustrated by Fig. 10. Since the directed path is infinite, it has a shortest finite subpath of length ≥ k whose first vertex is key-equal to its last vertex. Let D be the last but one vertex on this subpath. Since the ↪ _C-graph contains a directed edge from D to the first vertex of the subpath, it contains a cycle of some length nk with n ≥ 1. Since this cycle is obviously an n-embedding of C in $\mathbf {db}^{\prime }=\mathbf {db}\setminus {\mathbf {o}}_{m+1}$, it must be a relevant 1-embedding of C in $\mathbf {db}^{\prime }$ which, moreover, contains some fact of r. Therefore, there exists a valuation μ over vars(q) such that $\mu (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}$ and μ(q) ∩r≠∅, a contradiction.

Since the garbage set for db ∖o_m+ 1 is empty, it follows by Lemma 3 that o_m+ 1 is the maximum garbage set for C in db. This concludes the proof. □

Proof Proof of Lemma 6

For the first item, let $A\stackrel {{~}_{C}}{\hookrightarrow }A^{\prime }$ be any edge of the n-embedding. We can assume $F,F^{\prime }\in C$ such that $F\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F^{\prime }$, genre_q(A) = F, and ${\mathsf {genre}}_{q}({A^{\prime }})=F^{\prime }$. Then, the block-quotient graph will contain a directed edge from block(A,db) to ${\mathsf {block}}({A^{\prime }},{\mathbf {db}})$. It is then obvious that $({\mathbf {b}}_{0},{\mathbf {b}}_{1},\dots ,{\mathbf {b}}_{nk-1},{\mathbf {b}}_{0})$ is a directed cycle in the block-quotient graph; this cycle is elementary because no two distinct facts of an n-embedding are key-equal.

For the second item, let $i\in \{0,\dots ,nk-1\}$. Since (b_i,b_{i+ 1 mod nk}) is an edge in the block-quotient graph, we can assume A_i ∈b_i and $A^{\prime }\in {\mathbf {b}}_{i+1\mod nk}$ such that $A_{i}\stackrel {{~}_{C}}{\hookrightarrow }A^{\prime }$. By Lemma 4, it will be the case that $A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }{A_{nk-1}}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}$. Furthermore, the latter ↪ _C-cycle is an n-embedding. Indeed, since the cycle $({\mathbf {b}}_{0},{\mathbf {b}}_{1},\dots ,{\mathbf {b}}_{nk-1},{\mathbf {b}}_{0})$ is elementary, no two distinct A_is are key-equal. This concludes the proof. □

1.2 C.2 Proof of Lemma 8

We will use the following helping lemma. If G is a directed graph, then a directed cycle in G of length k is called a k-cycle.

Lemma 15

Let G = (V,E) be an instance of LONGCYCLE(k). Let $\widehat {G}=(\widehat {V},\widehat {E})$ be the undirected graph whose vertices are the k-cycles of G. There is an undirected edge between any two distinct k-cycles P₁ and P₂ if V (P₁) ∩ V (P₂)≠∅. Then, the following are equivalent:

1.
$\widehat {G}$ has a chordless cycle of length ≥ 2k or G has an elementary directed cycle of length nk with 2 ≤ n ≤ 2k − 3.
2.
G contains an elementary directed cycle of length ≥ 2k.

Proof

Since the graph G is k-partite, every k-cycle is elementary.

Assume that 1 holds true. The result is obvious if there exists n such that 2 ≤ n ≤ 2k − 3 and G has an elementary cycle of length nk. Assume next that $\widehat {G}$ has a chordless elementary cycle $(P_{0}, P_{1}, \dots , P_{m-1}, P_{0})$ of length m ≥ 2k. We construct a cycle C in G using the following procedure. The construction will define a labeling function ℓ from the vertices in C to $\{0,1,\dots ,m-1\}$. It will be the case that w ∈ V (P_ℓ(w)) for every vertex w in C. We start with any vertex v₀ ∈ V (P_m− 1) ∩ V (P₀) and define its label as $\ell (v_{0})\mathrel {\mathop :}= 0$. At any point of the procedure, if we are at vertex u with label ℓ(u), we choose the next vertex w in C to be the next vertex in the k-cycle P_ℓ(u). If ℓ(u) < m − 1 and w also belongs to P_ℓ(u)+ 1, we let $\ell (w)\mathrel {\mathop :}=\ell (u)+1$; otherwise $\ell (w)\mathrel {\mathop :}=\ell (u)$. The procedure terminates when we attempt to add a vertex that already exists in C, and therefore C will be elementary.

We first show that the termination condition will not be met for any vertex distinct from v₀. Suppose, toward a contradiction, that the sequence constructed so far is $C = \langle {v_{0}, v_{1}, \dots , v_{n}}\rangle $, ℓ(v_n) = i ≤ m − 1, and the next vertex in P_i is some v_j with $j\in \{1, \dots , n-1\}$. Since v_j belongs to both P_i and $P_{\ell (v_{j})}$, it must be the case that ℓ(v_j) ≥ i − 1, because otherwise $\{P_{i},P_{\ell (v_{j})}\}$ is a chord in $(P_{0}, P_{1}, \dots , P_{m-1}, P_{0})$, a contradiction. We now distinguish two cases:

Case ℓ(v_j) = i − 1.:: Then, v_j ∈ V (P_i− 1) ∩ V (P_i). By the procedure, this means that ℓ(v_j− 1) = i − 2. Indeed, if ℓ(v_j− 1) = i − 1, then the procedure would have set ℓ(v_j) to i, because v_j also belongs to P_i. But then this also implies that v_j ∈ V (P_i− 2), a contradiction to the fact that the cycle is chordless.
Case ℓ(v_j) = i.:: Then the procedure reaches a vertex on P_i that has been visited before. Therefore, starting with this previously visited vertex on P_i, the procedure has entirely traversed P_i without ever reaching a vertex of P_{i+ 1 mod m}, contradicting that P_i and P_{i+ 1 mod m} have a vertex in common.

It is now clear that at some point we will reach v₀. Indeed, when the label becomes m − 1, the procedure will follow the edges of P_m− 1 until it reaches v₀. We have that ℓ(v₀) = 0, and the procedure is such that if some vertex has label i with i < m − 1, then there is a vertex with label i + 1. Therefore, for every $i\in \{0,1,\dots ,m-1\}$, there exists at least one vertex u in C such that ℓ(u) = i. Therefore, C has at least m vertices. Since m ≥ 2k, the cycle C has length ≥ 2k.

Assume that

G contains an elementary directed cycle of length ≥ 2k, and
for all 2 ≤ n ≤ 2k − 3, G contains no elementary directed cycle of length nk.

We will show that $\widehat {G}$ contains a chordless cycle of length ≥ 2k.

We first introduce some notions that will be useful in the proof. A subpath of a directed path is a consecutive subsequence of edges of that path. Every path is a subpath of itself. We write start(π) and end(π) to denote, respectively, the first and the last vertex of a directed path π. If $\mathsf {end}({\pi })=\mathsf {start}({\pi ^{\prime }})$, then $\pi \cdot \pi ^{\prime }$ denotes the concatenation of paths π and $\pi ^{\prime }$. The length of a (possibly closed) elementary path π is the number of edges it contains, and is denoted length(π).Covering Let O be an elementary cycle in G of size ≥ 2k. A seam in O is a subpath of O that is also a subpath of some k-cycle. Obviously, every seam in O has length < k. A covering of O is a set of edge-disjoint seams in O such that every edge of O is an edge of some seam in the set. Since every edge of G belongs to some k-cycle by our hypothesis, O has a covering. We define ${\mathit {seamlength}}({O})\mathrel {\mathop :}=\ell $ if O has a covering of cardinality ℓ and every covering of O has cardinality ≥ ℓ.Cyclic Ordering of the Seams in a Covering Let $C=\{S_{0},S_{1},\dots ,S_{\ell -1}\}$ be a covering of O. From here on, we will assume that the seams are listed such that a traversal of O that starts with start(S₀) traverses the seams of C in the order S₀, S₁, …, S_ℓ− 1.

Let O be a directed cycle of length ≥ 2k that minimizes seamlength(⋅). From here on, ℓ denotes seamlength(O). Thus, every elementary cycle $O^{\prime }$ in G of length ≥ 2k satisfies ${\mathit {seamlength}}({O^{\prime }})\geq \ell $. Let $\{S_{0},S_{1},\dots ,S_{\ell -1}\}$ be a covering of O.

Our hypothesis is that for every directed cycle of length nk in G such that n ≥ 2, we have n > 2k − 3. Consequently, length(O) ≥ (2k − 2)k. For every $i\in \{0,\dots ,\ell -1\}$, we have length(S_i) ≤ k − 1 (because O is elementary with length(O) ≥ 2k). Therefore, $(2k-2)k\leq {\mathit {length}}({O})={\sum }_{i=0}^{\ell -1}{\mathit {length}}({S_{i}})\leq \ell (k-1)$, which implies ℓ ≥ 2k.

For every $i\in \{0,\dots ,\ell -1\}$, let P_i be a k-cycle of which S_i is a subpath. We define the fitness of P_i as ${\mathit {length}}({S_{i}^{\prime }})$ if $S_{i}^{\prime }$ is the longest subpath of P_i that has S_i as a subpath and that is still a seam in O. Note that the fitness of P_i is at least length(S_i). For a reason that will become apparent shortly, if multiple choices for the k-cycle P_i are possible, we will choose a k-cycle with the greatest fitness. Assume, toward a contradiction, that the subgraph of $\widehat {G}$ induced by $\{P_{0},P_{1},\dots ,P_{\ell -1}\}$ has a cycle chord. We can assume without loss of generality $m\in \{2,\dots ,\ell -2\}$ and a path $(P_{0},P_{1},\dots ,P_{m-1},P_{m})$ in $\widehat {G}$ such that $\{P_{0},P_{m}\}\in E(\widehat {G})$, while the paths $(P_{0},P_{1},\dots ,P_{m-1})$ and $(P_{1},\dots ,P_{m-1},P_{m})$ are chordless. From $\{P_{0},P_{m}\}\in E(\widehat {G})$, it follows that V (P₀) ∩ V (P_m)≠∅. We have V (S₀) ∩ V (S_m) = ∅. Let π be the closed directed path in G that, starting from start(S_m), traverses P_m until a vertex (call it x) of P₀ is reached. From x on, the path π follows P₀ until end(S₀) is reached, and then traverses $S_{1},S_{2},\dots ,S_{m-1}$. Note that it is possible that x ∈ V (S_m) or x ∈ V (S₀) (but not both). We argue next that π is an elementary cycle.

The edges of π that are not in O belong either to the subpath (call it π_m) of P_m that goes from end(S_m) to x, or to the subpath (call it π₀) of P₀ that goes from x to start(S₀). Note that π_m exists only if x∉V (S_m), and π₀ exists only if x∉V (S₀). Assume toward a contradiction that π is not elementary. From our hypotheses and construction, it must be the case that π_m intersects S_m− 1 in some vertex y, or that π₀ intersects S₁ in some vertex z. These possibilities are depicted in Fig 11. If this happens, however, $P_{m}^{\prime }$ and $P_{0}^{\prime }$ have a strictly greater fitness than P_m and P₀, contradicting that we chose k-cycles with the greatest fitness. Here, $P_{m}^{\prime }$ is the k-cycle that, starting from end(S_m− 1) = start(S_m), traverses P_m until y, and then follows P_m− 1 from y until end(S_m− 1). Similarly, $P_{0}^{\prime }$ is the k-cycle that, starting from end(S₀) = start(S₁), traverses P₁ until z, and then follows P₀ from z until end(S₀). To see that $P_{m}^{\prime }$ has a strictly greater fitness than P_m, note that the subpath of $P_{m}^{\prime }$ from y to end(S_m) is a seam of O. Since x∉V (S_m− 1), P_m will cover a strictly smaller suffix of S_m− 1 than $P_{m}^{\prime }$ does.

We show that both length(π) = k and length(π) ≥ 2k lead to a contradiction.

Assume that π is a k-cycle. Then either $S_{0}\cdot S_{1}\cdot \dotsm \cdot S_{m-1}$ is a seam of O or $S_{1}\cdot S_{2}\cdot \dotsm \cdot S_{m}$ is a seam of O. Since m ≥ 2, we can use π to construct a covering of O of cardinality < ℓ, a contradiction.
Assume that length(π) ≥ 2k. It can be easily seen that π has a covering of cardinality m + 1 < ℓ, which contradicts our assumption about O.

□

The proof of Lemma 8 can now be given.

Proof Proof of Lemma 8

Let G = (V,E) be an instance of LONGCYCLE(k). Let $\widehat {G}=(\widehat {V},\widehat {E})$ be the undirected graph defined in the statement of Lemma 15. Obviously, it suffices to show that Condition 1 in the statement of Lemma 15 can be expressed in SymStratDatalog.

All elementary cycles in G of length nk for 2 ≤ n ≤ 2k − 3 can obviously be found in FO. We now outline a program in SymStratDatalog that tests for the existence of chordless cycles in $\widehat {G}$ of length ≥ 2k. The graph $\widehat {G}$ can be constructed in SymStratDatalog. Then, the existence of a chordless cycle of length ≥ 2k can be tested as follows: Check whether there exists a path $(P_{0},P_{1},P_{2},\dots ,P_{2k-2},P_{2k-1},P_{2k})$ such that (i) the subpath $(P_{1},\dots ,P_{2k-1})$ is elementary and chordless, and (ii) the endpoints P₀ and P_2k are also connected by another (possibly single-vertex) path that uses no vertex that is equal or adjacent to a vertex in $\{P_{2},\dots ,P_{2k-2}\}$. In particular, P₀ and P_2k themselves must then be distinct from and not adjacent to the vertices in $\{P_{2},\dots ,P_{2k-2}\}$, and, consequently, P₀≠P₁ and P_2k≠P_2k− 1. The single-vertex path occurs if P₀ = P_2k.

We now give the details of the SymStratDatalog program. The following rule states that the vertices of $\widehat {G}$ are the k-cycles of G.

$$ \widehat{V}(x_{0},\dots,x_{k-1}) \leftarrow E(x_{0},x_{1}),E(x_{1},x_{2}),\dots,E(x_{k-2},x_{k-1}),E(x_{k-1},x_{0}) $$

Note incidentally that every k-cycle is stored k times in this way. Since the graph G is k-circle-layered (see Definition 7), we can assume some fixed partition $V_{0},V_{1},\dots ,V_{k-1}$ of the vertex set V. We will say that the IDB fact $\widehat {V}(a_{0},\dots ,a_{k-1})$ is of class V_i if a₀ ∈ V_i. Thus, if $\widehat {V}(a_{0},a_{1},\dots ,a_{k-1})$ is of class V_i, then $\widehat {V}(a_{1},\dots ,a_{k-1},a_{0})$ is of class V_{i+ 1 mod k}. If one partition class would be given as a part of the input, for example as EDB facts V0(a), then an optimization consists in adding V0(x₀) to the body of the previous rule.

We will need an equality test on vertices of $\widehat {G}$:

$$ \mathit{Eq}(x_{0},\dots,x_{k-1};x_{0},\dots,x_{k-1}) \leftarrow \widehat{V}(x_{0},\dots,x_{k-1}) $$

The use of the semicolon is for readability only. The following rules compute edges in $\widehat {G}$. For every $\ell \in \{0,\dots ,k-1\}$, add the rules:

$$ \mathit{\widehat{E}}(x_{0},\dots,x_{k-1};y_{0},\dots,y_{k-1}) \leftarrow \left\{ \begin{array}{l} \widehat{V}(x_{0},\dots,x_{k-1}),\widehat{V}(y_{0},\dots,y_{k-1}),\\[1.0ex] \neg\mathit{Eq}(x_{0},\dots,x_{k-1};y_{0},\dots,y_{k-1}),\\[1.0ex] x_{\ell}=y_{\ell} \end{array} \right. $$

Note that whenever $\mathit {\widehat {E}}(a_{0},\dots ,a_{k-1};b_{0},\dots ,b_{k-1})$ holds true, then $\widehat {V}(a_{0},\dots ,a_{k-1})$ and $\widehat {V}(b_{0},\dots ,b_{k-1})$ will be IDB $\widehat {V}$-facts of the same class. In fact, it is sufficient to compute chordless cycles all of whose $\widehat {V}$-facts are of the same class. From here on, we write $\vec {x}$ for the sequence $\langle {x_{0},\dots ,x_{k-1}}\rangle $. Superscripts are used to create new variables: x⁽ⁱ⁾ and x^(j) are distinct variables unless i = j. Finally, ${\vec {x}}^{(i)}$ is the sequence ${x_{0}}^{(i)},\dots ,{x_{k-1}}^{(i)}$. Likewise for $\vec {y}=\langle {y_{0},\dots ,y_{k-1}}\rangle $, $\vec {z}=\langle {z_{0},\dots ,z_{k-1}}\rangle $, and $\vec {w}=\langle {w_{0},\dots ,w_{k-1}}\rangle $. Add the following rule, as well as its symmetric rule:

$$ \mathit{UCon}(\vec{x},\vec{y},{\vec{z}}^{(1)},\dots,{\vec{z}}^{(2k-3)}) \leftarrow \left\{ \begin{array}{l} \mathit{UCon}(\vec{x},\vec{w},{\vec{z}}^{(1)},\dots,{\vec{z}}^{(2k-3)}), \widehat{E}(\vec{w},\vec{y}),\\ \\ \left\{\neg\mathit{Eq}(\vec{w},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3}, \left\{\neg\widehat{E}(\vec{w},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3}\\ \\ \left\{\neg\mathit{Eq}(\vec{y},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3}, \left\{\neg\widehat{E}(\vec{y},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3} \end{array} \right. $$

$\mathit {UCon}(\vec {a},\vec {b},\vec {c}_{1},\dots ,\vec {c}_{2k-3})$ holds true if $\widehat {G}$ contains an undirected path between $\vec {a}$ and $\vec {b}$ such that no vertex on the path is equal or adjacent to some $\vec {c}_{i}$. The basis of the recursion is the following rule:

$$ \mathit{UCon}(\vec{x},\vec{x},{\vec{z}}^{(1)},\dots,{\vec{z}}^{(2k-3)}) \leftarrow \left\{ \begin{array}{l} \widehat{V}(\vec{x}),\widehat{V}({\vec{z}}^{(1)}),\dots,\widehat{V}({\vec{z}}^{(2k-3)}),\\[1.0ex] \left\{\neg\mathit{Eq}(\vec{x},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3}, \left\{\neg\widehat{E}(\vec{x},{\vec{z}}^{(i)})\right\}_{i=1}^{2k-3} \end{array} \right. $$

Finally, the following rule tests for the existence of a chordless cycle in $\widehat {G}$ of length ≥ 2k.

$$ \mathit{Chordless}() \leftarrow \left\{ \begin{array}{l} \widehat{E}({\vec{x}}^{(0)},{\vec{x}}^{(1)}),\widehat{E}({\vec{x}}^{(1)},{\vec{x}}^{(2)}),\dots,\widehat{E}({\vec{x}}^{(2k-1)},{\vec{x}}^{(2k)}),\\[1.0ex] \left\{\neg\mathit{Eq}({\vec{x}}^{(i)},{\vec{x}}^{(j)})\right\}_{1\leq i<j\leq 2k-1},\\ \\ \left\{\neg\widehat{E}({\vec{x}}^{(i)},{\vec{x}}^{(j)})\right\}_{1\leq i<i+1<j\leq 2k-1},\\ \\ \mathit{UCon}({\vec{x}}^{(0)},{\vec{x}}^{(2k)},{\vec{x}}^{(2)},\dots,{\vec{x}}^{(2k-2)}) \end{array} \right. $$

This concludes the proof. □

1.3 C.3 Illustration of the Datalog Program in the Proof of Lemma 9

The following example illustrates the Datalog program in the proof of Lemma 9.

Example 5

Let $q=\{R(\underline {x},y,z), S(\underline {y},x,z), U(\underline {z},a)\}$, where a is a constant. We show a program in symmetric stratified Datalog that computes the garbage set for the M-cycle $C=R(\underline {x},y,z)\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } S(\underline {y},x,z)\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } R(\underline {x},y,z)$. In this example, k = 2. The program is constructed as in the proof of Lemma 9.

R-facts and S-facts belong to the maximum garbage set if they do not belong to a relevant 1-embedding. This is expressed by the following rules.

$$ \begin{array}{@{}rcl@{}} \mathsf{Rlvant{R}}(x,y,z) &\leftarrow& R(x,y,z), S(y,x,z), U(z,a)\\ \mathsf{Garbage{R}}(x) &\leftarrow& R(x,y,z), \neg\mathsf{Rlvant{R}}(x,y,z)\\ \mathsf{Rlvant{S}}(y,x,z) &\leftarrow& R(x,y,z), S(y,x,z), U(z,a)\\ \mathsf{Garbage{S}}(y) &\leftarrow& S(y,x,z), \neg\mathsf{Rlvant{S}}(y,x,z) \end{array} $$

If some R-fact or S-fact of a relevant 1-embedding belongs to the maximum garbage set, then every fact of that 1-embedding belongs to the maximum garbage set. This is expressed by the following rules.

$$ \begin{array}{@{}rcl@{}} \mathsf{Garbage{R}}(x) &\leftarrow& R(x,y,z), S(y,x,z), U(z,a), \mathsf{Garbage{S}}(y)\\ \mathsf{Garbage{S}}(y) &\leftarrow& R(x,y,z), S(y,x,z), U(z,a), \mathsf{Garbage{R}}(x) \end{array} $$

Note that the predicates GarbageR and GarbageS refer to blocks: whenever a fact is added to the garbage set, its entire block is added. The following rules compute irrelevant 1-embeddings.

$$ \begin{array}{@{}rcl@{}} \mathsf{Any1Emb}(x,y,z,y^{\prime},x^{\prime},z^{\prime}) &\leftarrow& \left\{ \begin{array}{l} R(x,y,z), S(y,x,z), U(z,a),\\ R(x^{\prime},y^{\prime},z^{\prime}), S(y^{\prime},x^{\prime},z^{\prime}), U(z^{\prime},a),\\ x=x^{\prime}, y=y^{\prime} \end{array} \right.\\ \mathsf{Rel1Emb}(x,y,z,y,x,z) &\leftarrow& R(x,y,z), S(y,x,z), U(z,a)\\ \mathsf{Irr1Emb}(x,y^{\prime}) &\leftarrow& \mathsf{Any1Emb}(x,y,z,y^{\prime},x^{\prime},z^{\prime}),\\ &&\neg\mathsf{Rel1Emb}(x,y,z,y^{\prime},x^{\prime},z^{\prime}) \end{array} $$

The predicate $\mathsf {\widehat {E}}$ is used for edges between vertices; each vertex is a (x,y)-value. The predicate Eq expresses equality of vertices.

$$ \mathsf{Eq}({x},{y},{x},{y}) \leftarrow R(x,y,z), S(y,x,z), U(z,a) $$

$$ \begin{array}{@{}rcl@{}} \mathsf{\widehat{E}}({x},{y},{x^{\prime}},{y^{\prime}}) \leftarrow \left\{ \begin{array}{l} R(x,y,z), S(y,x,z), U(z,a),\\ R(x^{\prime},y^{\prime},z^{\prime}), S(y^{\prime},x^{\prime},z^{\prime}), U(z^{\prime},a),\\ \neg\mathsf{Eq}({x},{y},{x^{\prime}},{y^{\prime}}), x=x^{\prime} \end{array} \right.\\ \mathsf{\widehat{E}}({x},{y},{x^{\prime}},{y^{\prime}}) \leftarrow \left\{ \begin{array}{l} R(x,y,z), S(y,x,z), U(z,a),\\ R(x^{\prime},y^{\prime},z^{\prime}), S(y^{\prime},x^{\prime},z^{\prime}), U(z^{\prime},a),\\ \neg\mathsf{Eq}({x},{y},{x^{\prime}},{y^{\prime}}), y=y^{\prime} \end{array} \right. \end{array} $$

The predicate UCon is used for undirected connectivity of the $\mathsf {\widehat {E}}$-predicate. In particular, it will be the case that UCon(a₁,b₁,a₂,b₂,a₃,b₃) holds true if there exists a path between vertices (a₁,b₁) and (a₂,b₂) such that no vertex on the path is equal or adjacent to (a₃,b₃). Recall that each vertex is itself a pair.

$$ \begin{array}{@{}rcl@{}} \mathsf{UCon}({x}_{1},{y}_{1},{x}_{1},{y}_{1},{x}_{3},{y}_{3}) &\leftarrow& \left\{ \begin{array}{l} R(x_{1},y_{1},z_{1}), S(y_{1},x_{1},z_{1}), U(z_{1},a),\\ R(x_{3},y_{3},z_{3}), S(y_{3},x_{3},z_{3}), U(z_{3},a),\\ \neg\mathsf{Eq}({x}_{1},{y}_{1},{x}_{3},{y}_{3}), \neg\mathsf{\widehat{E}}({x}_{1},{y}_{1},{x}_{3},{y}_{3}) \end{array} \right.\\ \mathsf{UCon}({x}_{1},{y}_{1},{x}_{2},{y}_{2},{x}_{3},{y}_{3}) &\leftarrow& \left\{ \begin{array}{l} \mathsf{UCon}({x}_{1},{y}_{1},{x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}), \mathsf{\widehat{E}}({x}_{\dagger},{y}_{\dagger},{x}_{2},{y}_{2}),\\ \neg\mathsf{Eq}({x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}),\neg\mathsf{\widehat{E}}({x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}),\\ \neg\mathsf{Eq}({x}_{2},{y}_{2},{x}_{3},{y}_{3}),\neg\mathsf{\widehat{E}}({x}_{2},{y}_{2},{x}_{3},{y}_{3}) \end{array} \right.\\ \mathsf{UCon}({x}_{1},{y}_{1},{x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}) &\leftarrow& \left\{ \begin{array}{l} \mathsf{UCon}({x}_{1},{y}_{1},{x}_{2},{y}_{2},{x}_{3},{y}_{3}), \mathsf{\widehat{E}}({x}_{\dagger},{y}_{\dagger},{x}_{2},{y}_{2}),\\ \neg\mathsf{Eq}({x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}),\neg\mathsf{\widehat{E}}({x}_{\dagger},{y}_{\dagger},{x}_{3},{y}_{3}),\\ \neg\mathsf{Eq}({x}_{2},{y}_{2},{x}_{3},{y}_{3}),\neg\mathsf{\widehat{E}}({x}_{2},{y}_{2},{x}_{3},{y}_{3}) \end{array} \right. \end{array} $$

The latter two rules are each other’s symmetric version. The following rule checks whether a vertex (a₁,b₁) belongs to a chordless $\mathsf {\widehat {E}}$-cycle of length ≥ 2k.

$$ \begin{array}{@{}rcl@{}} \mathsf{InLongUCycle}({x}_{1},{y}_{1}) \leftarrow \left\{ \begin{array}{l} \mathsf{\widehat{E}}({x}_{0},{y}_{0},{x}_{1},{y}_{1}), \mathsf{\widehat{E}}({x}_{1},{y}_{1},{x}_{2},{y}_{2}),\\ \mathsf{\widehat{E}}({x}_{2},{y}_{2},{x}_{3},{y}_{3}), \mathsf{\widehat{E}}({x}_{3},{y}_{3},{x}_{4},{y}_{4}),\\ \neg\mathsf{\widehat{E}}({x}_{1},{y}_{1},{x}_{3},{y}_{3}),\\ \neg\mathsf{Eq}({x}_{1},{y}_{1},{x}_{2},{y}_{2}), \neg\mathsf{Eq}({x}_{1},{y}_{1},{x}_{3},{y}_{3}), \neg\mathsf{Eq}({x}_{2},{y}_{2},{x}_{3},{y}_{3}),\\ \mathsf{UCon}({x}_{0},{y}_{0},{x}_{4},{y}_{4},{x}_{2},{y}_{2}) \end{array} \right. \end{array} $$

The following rules add to the maximum garbage sets all R-facts and S-facts that belong to an irrelevant 1-embedding or to a strong component of the ↪ _C-graph that contains an elementary ↪ _C-cycle of length ≥ 2k. Whenever a fact is added, all facts of its block are added.

$$ \begin{array}{@{}rcl@{}} \mathsf{Garbage{R}}(x) &\leftarrow& \mathsf{InLongUCycle}(x,y)\\ \mathsf{Garbage{S}}(y) &\leftarrow& \mathsf{InLongUCycle}(x,y)\\ \mathsf{Garbage{R}}(x) &\leftarrow& \mathsf{Irr1Emb}(x,y)\\ \mathsf{Garbage{S}}(y) &\leftarrow& \mathsf{Irr1Emb}(x,y) \end{array} $$

This terminates the computation of the garbage set. In general, we have to check the existence of elementary ↪ _C-cycles of length nk with 2 ≤ n ≤ 2k − 3. However, for k = 2, no such n exists.

1.4 C.4 Proof of Lemma 10

Proof Proof of Lemma 10

Let $q^{\prime }=({q\setminus C})\cup \{T\}$. For every $i\in \{0,1,\dots ,k-1\}$, let F_i = $R_{i}(\underline {\vec {x}_{i}},\vec {y}_{i})$. Here is an informal visual representation of the different queries involved:

Proof of the First Item We show the existence of a reduction from CERTAINTY(q) to the problem ${\mathsf {CERTAINTY}}({q^{\prime }\cup p})$ that is expressible in ${\mathit {SymStratDatalog}}^{\min \limits }$. We first describe the reduction, and then show that it can be expressed in ${\mathit {SymStratDatalog}}^{\min \limits }$.

Let db₀ be a database that is input to CERTAINTY(q). By Lemma 9, we can compute in symmetric stratified Datalog the maximum garbage set o for C in db₀. Let db = db₀ ∖o. We know, by Lemma 2, that the problem CERTAINTY(q) has the same answer on instances db₀ and db. Moreover, by Lemma 3, every garbage set for C in db is empty, which implies, by Lemma 5, that (i) every n-embedding of C in db must be a relevant 1-embedding, and (ii) every fact A with genre_q(A) ∈ C belongs to a 1-embedding. The reduction will now encode all these 1-embeddings as T-facts.

We show that every directed edge of the ↪ _C-graph belongs to a directed cycle. To this end, take any edge A↪ _CB. Since every garbage set for C in db is empty, the ↪ _C-graph contains a relevant 1-embedding containing A, and a relevant 1-embedding containing B. Let $A^{\prime }$ be the fact such that $A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B$ is a directed edge in the 1-embedding containing B. Let $B^{\prime }$ be the fact such that $A\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }$ is a directed edge in the 1-embedding containing A. Since A↪ _CB and $A\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }$, it follows $B\sim B^{\prime }$ by Lemma 4. From $A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B$ and $B\sim B^{\prime }$, it follows $A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }$. Thus, the ↪ _C-graph contains a directed path from B to $A^{\prime }$, an edge from $A^{\prime }$ to $B^{\prime }$, and a directed path from $B^{\prime }$ to A. Consequently, the ↪ _C-graph contains a directed path from B to A.

It follows that every strong component of the ↪ _C-graph is initial. It can be easily seen that if an initial strong component contains some fact A, then it contains every fact that is key-equal to A. Let r be a repair of db. For every fact A ∈r, there exists a unique fact B ∈r such that A↪ _CB. It follows that r must contain an elementary ↪ _C-cycle, which must be a relevant 1-embedding (because every garbage set for C in db is empty) belonging to the same initial strong component as A. It can also be seen that there exists a repair that contains exactly one such 1-embedding for every strong component of the ↪ _C-graph.

We define an undirected graph G as follows: for each valuation μ over vars(q) such that $\mu (q)\subseteq \mathbf {db}$, we introduce a vertex 𝜃 with 𝜃 = μ[vars(C)]. We add an edge between two vertices 𝜃 and $\theta ^{\prime }$ if for some $i\in \{0,\dots ,k-1\}$, $\theta (\vec {x}_{i})=\theta ^{\prime }(\vec {x}_{i})$. The graph G can clearly be constructed in logarithmic space (and even in FO). We define a set db_T of T-facts and, for every $i\in \{0,\dots ,k-1\}$, a set db_i as follows: for all two vertices 𝜃, $\theta ^{\prime }$ of G, if

$$ \theta^{\prime}(\vec{x}_{0})=\min\left\{\theta^{\prime\prime}(\vec{x}_{0})\mid \theta^{\prime\prime}\in V(G) \text{ belongs to the same strong component as } \theta \right\}, $$

then we add to db_T the fact ${\theta }_{[{{u}\mapsto {\theta ^{\prime }(\vec {x}_{0})}}]}(T)$, and we add to db_i the fact ${\theta }_{[{{u}\mapsto {\theta ^{\prime }(\vec {x}_{0})}}]}(N_{i})$. In this way, every db_i is consistent. Informally, if T is the atom $T(\underline {u},\vec {w})$, then we add to db_T the T-fact $T(\underline {\theta ^{\prime }(\vec {x}_{0})},\theta (\vec {w}))$, where $\theta ^{\prime }(\vec {x}_{0})$ is treated as a single value. This fact represents that 𝜃 belongs to the strong component that is identified by $\theta ^{\prime }(\vec {x}_{0})$. Since undirected connectivity can be computed in logarithmic space [33], db_T and each db_i can be constructed in logarithmic space.

Let db_C be the set of all F_i-facts in db (0 ≤ i ≤ k − 1), and let $\mathbf {db}_{{\mathsf {shared}}}\mathrel {\mathop :}=\mathbf {db}\setminus \mathbf {db}_{C}$, the part of the database db that is preserved by the reduction. Let $\mathbf {db}_{N}=\bigcup _{i=0}^{k-1}\mathbf {db}_{i}$. Since db_N is consistent, db_shared ⊎db_T ⊎db_N is a legal input to ${\mathsf {CERTAINTY}}({q^{\prime }\cup p})$, where the use of ⊎ (rather than ∪) indicates that the operands of the union are disjoint. Here is an informal visual representation of the reduction:

We show that the following are equivalent:

1.
Every repair of db satisfies q.
2.
For every s ∈rset(db_shared), for every repair r_T of db_T, ${\mathbf {s}}\uplus {\mathbf {r}}_{T}\uplus \mathbf {db}_{N}\models q^{\prime }\cup p$.
3.
Every repair of db_shared ⊎db_T ⊎db_N satisfies $q^{\prime }\cup p$.

The equivalence 2 ⇔ 3 is straightforward. We show next the equivalence 1 ⇔ 2. Let s ∈rset(db_shared) and let r_T be a repair of db_T. By our construction of db_T, there exists a repair r_C of db_C such that for every valuation 𝜃 over vars(q), if $\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}$, then for some value c, ${\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}$. Informally, r_C contains all (and only) the relevant 1-embeddings of C in ∪r_C that are encoded by the T-facts of r_T. Since s ∪r_C is a repair of db, by the hypothesis 1, we can assume a valuation 𝜃 over vars(C) such that $\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}$. Consequently, for some value c, ${\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}$. Let r be a repair of db. There exist s ∈rset(db_shared) and r_C ∈rset(db_C) such that r = s ∪r_C. By the construction of db_T, there exists a repair r_T of db_T such that for every valuation 𝜃 over vars(q), if ${\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}$ for some c, then $\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}$ (note incidentally that the converse does not generally hold). Informally, for every strong component $\mathcal {S}$ of the ↪ _C-graph of db such that ${\mathbf {s}}\cup ({{\mathbf {r}}_{C}\cap V(\mathcal {S})})\models q$, the set r_T encodes one 1-embedding of C in ${\mathbf {s}}\cup ({{\mathbf {r}}_{C}\cap V(\mathcal {S})})$. Here, $V(\mathcal {S})$ denotes the vertex set of the strong component $\mathcal {S}$; thus $V(\mathcal {S})\subseteq \mathbf {db}_{C}$. Since s ∪r_T ∪db_N is a repair of db_shared ⊎db_T ⊎db_N, it follows by the hypothesis 2 that there exists a valuation 𝜃 over vars(q) such that ${\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}$ for some c. Consequently, $\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}$.

In the main body of this article, we have shown a program in ${\mathit {SymStratDatalog}}^{\min \limits }$ that computes the reduction.Proof of the Second Item Assume that the attack graph of q contains no strong cycle and that some initial strong component of the attack graph contains every atom of $\{F_{0},F_{1},\dots ,F_{k-1}\}$. Since all N_i-facts have mode c, they have no outgoing attacks in the attack graph of $q^{\prime }\cup p$. Since ${\mathsf {vars}}({N_{i}})\subseteq {\mathsf {vars}}({T})$ for every atom N_i ∈ p, we can limit our analysis to witnesses for attacks that do not contain any N_i. Indeed, if N_i would occur in a witness, it can be replaced with T. Let $\mathcal {S}$ be an initial strong component of the attack graph of q that contains every atom of $\{F_{0},F_{1},\dots ,F_{k-1}\}$. We will use the following properties:

(a)
For all $X,Y\subseteq \mathsf {vars}({q})$, if ${\mathcal {K}}({q})\models {X}\rightarrow {Y}$, then ${\mathcal {K}}({q^{\prime }\cup p})\models {X}\rightarrow {Y}$. This holds true because ${\mathcal {K}}({q^{\prime }\cup p})\models {\mathcal {K}}({q})$. To prove the latter claim, note that ${\mathcal {K}}({q})\setminus {\mathcal {K}}({q^{\prime }\cup p})=\{{{\mathsf {key}}({F_{i}})}\rightarrow {{\mathsf {vars}}({F_{i}})}\}_{i=0}^{k-1}$. For all $i\in \{0,1,\dots ,k-1\}$, we have that ${\mathcal {K}}({\{T,N_{i}\}})\equiv \{{u}\rightarrow {\mathsf {vars}({C})},{{\mathsf {key}}({F_{i}})}\rightarrow {u}\}$ with ${\mathsf {vars}}({F_{i}})\subseteq \mathsf {vars}({C})$. Consequently, ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({F_{i}})}\rightarrow {{\mathsf {vars}}({F_{i}})}$.
(b)
As an immediate consequence of (a), we have ${H}^{+,{q}}\subseteq {H}^{+,{q^{\prime }\cup p}}$ for every H ∈ q ∖ C.
(c)
For every H ∈ q ∖ C, if $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$, then $H\in \mathcal {S}$. To show this result, let H ∈ q ∖ C such that $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$. We can assume without loss of generality the existence of a witness for $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$ of the form $\omega \stackrel {v}{\smallfrown }T$ with v≠u, where the sequence ω starts with H. We can assume the existence of $j\in \{0,\dots ,k-1\}$ such that v ∈vars(F_j). From the preceding property (b), it follows that the sequence $\omega \stackrel {v}{\smallfrown }F_{j}$ is a witness for $H\overset {q}{\rightsquigarrow }F_{j}$. Since $F_{j}\in \mathcal {S}$, we conclude $H\in \mathcal {S}$.
(d)
For all $G,H\!\in \!\mathcal {S}$, we have ${\mathcal {K}}({q^{\prime }\!\cup \! p})\!\models \!{{\mathsf {key}}({G})}\!\rightarrow \!{{\mathsf {key}}({H})}$. To show this result, let $G,H\in \mathcal {S}$. Since $\mathcal {S}$ is an initial strong component of the attack graph of q, there exists an elementary attack cycle that contains both G and H. Since the attack graph of q contains no strong cycle, for every edge $J\overset {q}{\rightsquigarrow }J^{\prime }$ on this attack cycle, we have ${\mathcal {K}}({q})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({J^{\prime }})}$. It can now be easily seen that ${\mathcal {K}}({q})\models {{\mathsf {key}}({G})}\rightarrow {{\mathsf {key}}({H})}$. Finally, by property (a), ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({G})}\rightarrow {{\mathsf {key}}({H})}$.

We know by [21, Lemma 3.6] that if the attack graph contains a strong cycle, then it contains a strong cycle of length 2. Therefore, to conclude the proof, it suffices to show that every cycle of length 2 in the attack graph of $q^{\prime }\cup p$ is weak. To this end, assume that the attack graph of $q^{\prime }\cup p$ contains an attack cycle $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$. Then, either H≠T or J≠T (or both). We assume without loss of generality that H≠T. We show that the attack cycle $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ is weak. We distinguish three cases.

Case that $H\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T$ (therefore J≠T) and $J\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T$.:

Then no witness for $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J$ or $J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ can contain T. By property (b), $H\stackrel {q}{\rightsquigarrow }J\stackrel {q}{\rightsquigarrow }H$. Since the attack graph of q contains no strong attack cycle, ${\mathcal {K}}({q})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}$ and ${\mathcal {K}}({q})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}$. Then, by property (a), ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}$ and ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}$. It follows that the attack cycle $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ is weak.

Case that $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$.:

By property (c), $H\in \mathcal {S}$. We distinguish two cases.

Case that J = T.:

By property (d), ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({F_{0}})}$ and ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({F_{0}})}\rightarrow {{\mathsf {key}}({H})}$. In the following, recall that {u} = key(T). Since $\mathcal {K}({q^{\prime }\cup p})\models \mathsf {key}(F_{0}) \rightarrow u$ and $\mathcal {K}({q^{\prime }\cup p})\models u \rightarrow \mathsf {key}(F_{0})$ hold by the construction of $q^{\prime }\cup p$, we conclude ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {u}$ and ${\mathcal {K}}({q^{\prime }\cup p})$ $\models {u}\rightarrow {{\mathsf {key}}({H})}$. It follows that the attack cycle $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ is weak.

Case that J≠T.:

We show that $J\in \mathcal {S}$ by distinguishing two cases:

If $J\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T$, then no witness for $J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ contains T. Then, by property (b), any witness for $J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ is also a witness for $J\overset {q}{\rightsquigarrow }H$, and therefore $J\in \mathcal {S}$.
If $J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$, then $J\in \mathcal {S}$ by property (c).

From $H,J\in \mathcal {S}$, it follows ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}$ and ${\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}$ by property (d). It follows that the attack cycle $H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H$ is weak.

Case that $J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T$ (therefore J≠T).:

This case is symmetrical to a case that has already been treated.

□

Appendix D: Proofs of Section 8.1

1.1 D.1 Proof of Lemma 11

We will use two helping lemmas.

Lemma 16

[35, Lemma 4.3] Let q be a self-join-free Boolean conjunctive query, and r a consistent database. If α₁,α₂ are valuations over vars(q) such that $\alpha _{1}(q)\subseteq {\mathbf {r}}$ and $\alpha _{2}(q)\subseteq {\mathbf {r}}$, then {α₁,α₂} satisfies every functional dependency in ${\mathcal {K}}({q})$.

Lemma 17

Let q be a query in sjfBCQ. Let ${Z}\rightarrow {w}$ be a functional dependency that is internal to q. Let $\vec {z}$ be a sequence of distinct variables such that $\mathsf {vars}({\vec {z}})=Z$. Let $q^{\prime }=q\cup \{N^{\mathsf {c}}(\underline {\vec {z}},w)\}$ where N is a fresh relation name of mode c. Then,

1.
there exists a first-order reduction from CERTAINTY(q) to ${\mathsf {CERTAINTY}}({q^{\prime }})$; and
2.
if the attack graph of q contains no strong cycle, then the attack graph of $q^{\prime }$ contains no strong cycle.

Proof Proof of the first item

By the second condition in Definition 8, we can assume an atom F ∈ q such that $Z\subseteq {\mathsf {vars}}({F})$. Let $F_{1},F_{2},\dots ,F_{\ell }$ be a sequential proof for ${\mathcal {K}}({q})\models {Z}\rightarrow {w}$ such that for every $i\in \{1,\dots ,\ell \}$, for every u ∈ Z ∪{w}, $F_{i}\stackrel {q}{\not \rightsquigarrow }u$. It can be easily seen that for every $i\in \{0,\dots ,\ell -1\}$, we have

$$ {\mathcal{K}}({\{F_{j}\}_{j=1}^{i}})\models{Z}\rightarrow{{\mathsf{key}}({F_{i+1}})}. $$

(4)

Let db be a database that is the input to CERTAINTY(q). We repeat the following “purification” step: If for two valuations over vars(q), denoted β₁ and β₂, we have $\beta _{1}(q),\beta _{2}(q)\subseteq \mathbf {db}$ and $\{\beta _{1},\beta _{2}\}\not \models {Z}\rightarrow {w}$, then we remove both the F-block containing β₁(F) and the F-block containing β₂(F). Note that β₁(F) and β₂(F) may be key-equal, and hence belong to the same F-block.

Assume that we apply this step on $\mathbf {db}^{\prime }$ and obtain $\mathbf {db}^{\prime \prime }$. We show that some repair of $\mathbf {db}^{\prime }$ falsifies q if and only if some repair of $\mathbf {db}^{\prime \prime }$ falsifies q. The ⇒-direction trivially holds true. For the ⇐=-direction, let ${\mathbf {r}}^{\prime \prime }$ be a repair of $\mathbf {db}^{\prime \prime }$ that falsifies q. Assume, toward a contradiction, that every repair of $\mathbf {db}^{\prime }$ satisfies q. For every repair r, define Reify(r) as the set of valuations over Z ∪{w} containing 𝜃 if r⊧𝜃(q). Let

$$ {\mathbf{r}}^{\prime}= \left\{ \begin{array}{ll} {\mathbf{r}}^{\prime\prime}\cup\{\beta_{j}(F)\} \text{\ for some} j\in\{1,2\}& \text{if } \beta_{1}(F) \text{ and } \beta_{2}(F) \text{ are key-equal}\\ {\mathbf{r}}^{\prime\prime}\cup\{\beta_{1}(F),\beta_{2}(F)\} & \text{otherwise} \end{array} \right. $$

Note that if β₁(F) and β₂(F) are key-equal, then we can choose either ${\mathbf {r}}^{\prime }={\mathbf {r}}^{\prime \prime }\cup \{\beta _{1}(F)\}$ or ${\mathbf {r}}^{\prime }={\mathbf {r}}^{\prime \prime }\cup \{\beta _{2}(F)\}$; the actual choice does not matter. Obviously, ${\mathbf {r}}^{\prime }$ is a repair of $\mathbf {db}^{\prime }$. Since we assumed that every repair of $\mathbf {db}^{\prime }$ satisfies q, we can assume a valuation α over vars(q) such that $\alpha (q)\subseteq {\mathbf {r}}^{\prime }$. Since $\alpha (q)\nsubseteq {\mathbf {r}}^{\prime \prime }$ (because ${\mathbf {r}}^{\prime \prime }\not \models q$), it must be the case that for some j ∈{1, 2}, α(F) = β_j(F). From $\mathsf {vars}({\vec {z}})=Z\subseteq {\mathsf {vars}}({F})$, it follows that $\alpha (\vec {z})=\beta _{j}(\vec {z})$. From $\beta _{1}(\vec {z})=\beta _{2}(\vec {z})$, it follows $\alpha (\vec {z})=\beta _{1}(\vec {z})$ and $\alpha (\vec {z})=\beta _{2}(\vec {z})$. Since β₁(w)≠β₂(w), either α(w)≠β₁(w) or α(w)≠β₂(w) (or both). Therefore, we can assume b ∈{1, 2} such that α(w)≠β_b(w). It will be the case that ${\textsf {Reify}}({{\mathbf {r}}^{\prime }})=\{\alpha [Z\cup \{w\}]\}$.^{Footnote 2} Indeed, since α is an arbitrary valuation over vars(q) such that $\alpha (q)\subseteq {\mathbf {r}}^{\prime }$, it follows that for all valuations α₁,α₂ over vars(q), if $\alpha _{1}(q),\alpha _{2}(q)\subseteq {\mathbf {r}}^{\prime }$, then $\alpha _{1}(\vec {z})=\alpha _{2}(\vec {z})$ and therefore, by Lemma 16 and using that ${\mathcal {K}}({q})\models {Z}\rightarrow {w}$, we have α₁(w) = α₂(w).

We now claim that for all $i\in \{0,1,\dots ,\ell \}$, there exists a pair $({\mathbf {r}}^{\prime i},\alpha ^{i})$ such that

1.
${\mathbf {r}}^{\prime i}$ is a repair of $\mathbf {db}^{\prime }$;
2.
αⁱ is a valuation over vars(q) such that $\alpha ^{i}(q)\subseteq {\mathbf {r}}^{\prime i}$;
3.
$\alpha ^{i}(\{F_{j}\}_{j=1}^{i})=\beta _{b}(\{F_{j}\}_{j=1}^{i})$ and $\alpha ^{i}(\vec {z})=\beta _{b}(\vec {z})$ (and therefore $\alpha ^{i}(\vec {z})=\alpha (\vec {z})$);
4.
αⁱ(w) = α(w); and
5.
${\textsf {Reify}}({{\mathbf {r}}^{\prime i}})=\{\alpha [Z\cup \{w\}]\}$.

The third condition entails $\{\alpha ^{i},\beta _{b}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{i}})$ for all $i\in \{0,1,\dots ,\ell \}$. From (4), it follows $\{\alpha ^{i},\beta _{b}\}\models {Z}\rightarrow {{\mathsf {key}}({F_{i+1}})}$. Then, from $\alpha ^{i}(\vec {z})=\beta _{b}(\vec {z})$, it follows that αⁱ and β_b agree on all variables of key(F_i+ 1).

The proof of the above claim runs by induction on increasing i. For the basis of the induction, i = 0, the desired result holds by choosing ${\mathbf {r}}^{\prime 0}={\mathbf {r}}^{\prime }$ and α⁰ = α.

For the induction step, $i\rightarrow i+1$, the induction hypothesis is that the desired pair $({\mathbf {r}}^{\prime i},\alpha ^{i})$ exists. Since αⁱ and β_b agree on all variables of key(F_i+ 1), we have that αⁱ(F_i+ 1) and β_b(F_i+ 1) are key-equal. From $\beta _{b}(q)\subseteq \mathbf {db}^{\prime }$, it follows that $\beta _{b}(F_{i+1})\in \mathbf {db}^{\prime }$. Let ${\mathbf {r}}^{\prime i+1}=\left ({{\mathbf {r}}^{\prime i}\setminus \{\alpha ^{i}(F_{i+1})\}}\right )\cup \{\beta _{b}(F_{i+1})\}$, which is obviously a repair of $\mathbf {db}^{\prime }$. Since $F_{i+1}\stackrel {q}{\not \rightsquigarrow }u$ for all u ∈ Z ∪{w}, ${\textsf {Reify}}({{\mathbf {r}}^{\prime i+1}})\subseteq {\textsf {Reify}}({{\mathbf {r}}^{\prime i}})$ by [21, Lemma B.1]. Since we assumed that every repair of $\mathbf {db}^{\prime }$ satisfies q, we have that ${\textsf {Reify}}({{\mathbf {r}}^{\prime i+1}})\neq \emptyset $, and therefore $\textsf {Reify}({{\mathbf {r}}^{\prime i+1}})=\{\alpha [Z\cup \{w\}]\}$. Hence, there exists a valuation α^i+ 1 over vars(q) such that $\alpha ^{i+1}(q)\subseteq {\mathbf {r}}^{\prime i+1}$ and α^i+ 1[Z ∪{w}] = α[Z ∪{w}], that is, $\alpha ^{i+1}(\vec {z})=\alpha (\vec {z})$ and α^i+ 1(w) = α(w). Since $\alpha (\vec {z})=\beta _{b}(\vec {z})$, we have $\alpha ^{i+1}(\vec {z})=\beta _{b}(\vec {z})$. We have thus shown that the pair $({\mathbf {r}}^{\prime i+1},\alpha ^{i+1})$ satisfies items 1, 2, 4, and 5 in the above five-item list; we also have shown the second conjunct of item 3. In the next paragraph, we show that $\alpha ^{i+1}(\{F_{j}\}_{j=1}^{i+1})=\beta _{b}(\{F_{j}\}_{j=1}^{i+1})$, i.e., the first conjunct of item 3.

By the induction hypothesis, $\alpha ^{i}(\{F_{j}\}_{j=1}^{i})=\beta _{b}(\{F_{j}\}_{j=1}^{i})$ and $\alpha ^{i}(q)\subseteq {\mathbf {r}}^{\prime i}$, which implies $\beta _{b}(\{F_{j}\}_{j=1}^{i})\subseteq {\mathbf {r}}^{\prime i}$. Since ${\mathbf {r}}^{\prime i}$ and ${\mathbf {r}}^{\prime i+1}$ include the same set of F_j-facts for every $j\in \{1,\dots ,i\}$, we have $\beta _{b}(\{F_{j}\}_{j=1}^{i})\subseteq {\mathbf {r}}^{\prime i+1}$. Since $\beta _{b}(F_{i+1})\in {\mathbf {r}}^{\prime i+1}$ by construction, we obtain $\beta _{b}(\{F_{j}\}_{j=1}^{i+1})\subseteq {\mathbf {r}}^{\prime i+1}$. Since also $\alpha ^{i+1}(\{F_{j}\}_{j=1}^{i+1})\subseteq {\mathbf {r}}^{\prime i+1}$ (because $\alpha ^{i+1}(q)\subseteq {\mathbf {r}}^{\prime i+1}$), it is correct to conclude that $\{\beta _{b},\alpha ^{i+1}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{i+1}})$ by Lemma 16. We are now ready to show that α^i+ 1(F_j) = β_b(F_j) for all $j\in \{1,\dots ,i+1\}$. To this end, pick any $k\in \{1,\dots ,i+1\}$. We have ${\mathcal {K}}({\{F_{j}\}_{j=1}^{k-1}})\models {Z}\rightarrow {{\mathsf {key}}({F_{k}})}$ by (4). Since $\{F_{j}\}_{j=1}^{k-1}$ is a subset of $\{F_{j}\}_{j=1}^{i+1}$, we have $\{\beta _{b},\alpha ^{i+1}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{k-1}})$, and therefore $\{\beta _{b},\alpha ^{i+1}\}\models {Z}\rightarrow {{\mathsf {key}}({F_{k}})}$. Then, from $\alpha ^{i+1}(\vec {z})=\beta _{b}(\vec {z})$ (the second conjunct of item 3), it follows that α^i+ 1 and β_b agree on all variables of key(F_k). Since $\alpha ^{i+1}(F_{k}),\beta _{b}(F_{k})\in {\mathbf {r}}^{\prime i+1}$, it must be the case that α^i+ 1(F_k) = β_b(F_k). This concludes the induction step.

For the pair $({\mathbf {r}}^{\prime \ell },\alpha ^{\ell })$, we have that $\alpha ^{\ell }(\{F_{j}\}_{j=1}^{\ell })=\beta _{b}(\{F_{j}\}_{j=1}^{\ell })$, and therefore, since w occurs in some F_j, α^ℓ(w) = β_b(w). Since also α^ℓ(w) = α(w), we obtain α(w) = β_b(w), a contradiction. We conclude by contradiction that some repair of $\mathbf {db}^{\prime }$ falsifies q. Thus, the purification step described in the paragraph immediate following (4) does not change the answer to CERTAINTY(q).

We repeat the “purification” step until it can no longer be applied. Let the final database be $\widehat {\mathbf {db}}$. By the above reasoning, we have that every repair of $\widehat {\mathbf {db}}$ satisfies q if and only if every repair of db satisfies q. Let s be the smallest set of N-facts containing $N(\underline {\beta (\vec {z})},\beta (w))$ for every valuation β over vars(q) such that $\beta (q)\subseteq \mathbf {db}$. We show that s is consistent. To this end, let β₁,β₂ be valuations over vars(q) such that $\beta _{1}(q),\beta _{2}(q)\subseteq \mathbf {db}$ and $\beta _{1}(\vec {z})=\beta _{2}(\vec {z})$. If β₁(w)≠β₂(w), then a purification step can remove the block containing β₁(F), contradicting our assumption that no purification step is applicable on $\widehat {\mathbf {db}}$. We conclude by contradiction that β₁(w) = β₂(w).

Since N has mode c and s is consistent, we have that $\widehat {\mathbf {db}}\cup {\mathbf {s}}$ is a legal database. It can now be easily seen that every repair of db satisfies q if and only if every repair of $\widehat {\mathbf {db}}\cup {\mathbf {s}}$ satisfies $q^{\prime }=q\cup \{N^{\mathsf {c}}(\underline {\vec {z}},w)\}$.

It remains to be argued that the reduction is in FO, i.e., that the result of the repeated “purification” step can be obtained by a single first-order query. Let $\mathsf {vars}({q})=\{x_{1},\dots ,x_{n}\}$. Let $q^{*}(x_{1},\dots ,x_{n})\mathrel {\mathop :}=\bigwedge _{G\in q}G$ be the quantifier-free part of the first-order formula expressing the Boolean query q. For every $i\in \{1,\dots ,n\}$, let $x_{i}^{\prime }$ be a fresh variable. Let $\vec {u}$ be a sequence of distinct variables such that $\mathsf {vars}({\vec {u}})={\mathsf {vars}}({F})$. The following query finds all F-facts whose blocks can be removed:

$$\left \{\vec{u}\mid\exists^{*}\left({q^{*}(x_{1},\dots,x_{n})\land q^{*}(x_{1}^{\prime},\dots,x_{n}^{\prime})\land\left({\bigwedge_{z\in Z}z=z^{\prime}}\right)\land w\neq w^{\prime}}\right)\right\},$$

where the existential quantification ranges over all variables not in $\vec {u}$. The F-facts that are to be preserved are not key-equal to a fact in the preceding query and can obviously be computed in FO. This concludes the proof of the first item.Proof of the Second Item Assume that the attack graph of q contains no strong cycle. We will show that the attack graph of $q^{\prime }$ contains no strong cycle either. By the second item in Definition 8, we can assume an atom G ∈ q such that $Z\subseteq {\mathsf {vars}}({G})$. Note that the atom $N^{\mathsf {c}}(\underline {\vec {z}},w)$ has no outgoing attacks because its mode is c. It is sufficient to show that for every F,H ∈ q, if there exists a witness for $F\stackrel {q^{\prime }}{\rightsquigarrow }H$, then there exists a witness for $F\stackrel {q^{\prime }}{\rightsquigarrow }H$ that does not contain $N^{\mathsf {c}}(\underline {\vec {z}},w)$. To this end, assume that a witness for $F\stackrel {q^{\prime }}{\rightsquigarrow }H$ contains

$$ \dotsm F^{\prime}\stackrel{u^{\prime}}{\smallfrown}N^{\mathsf{c}}(\underline{\vec{z}},w)\stackrel{u^{\prime\prime}}{\smallfrown}F^{\prime\prime}\dotsm, $$

(5)

where $u^{\prime }$ and $u^{\prime \prime }$ are distinct variables. We can assume without loss of generality that this is the only occurrence of $N^{\mathsf {c}}(\underline {\vec {z}},w)$ in the witness. In this case, we have $F\overset {q}{\rightsquigarrow }u^{\prime }$. If $u^{\prime },u^{\prime \prime }\in Z$, then we can replace $N^{\mathsf {c}}(\underline {\vec {z}},w)$ with G. So the only nontrivial case is where either $u^{\prime }=w$ or $u^{\prime \prime }=w$ (but not both). Then, it must be the case that ${\mathcal {K}}({q^{\prime }\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {w}$, and therefore also

$$ {\mathcal{K}}({q\setminus\{F\}})\not\models{{\mathsf{key}}({F})}\rightarrow{w}. $$

(6)

Since ${Z}\rightarrow {w}$ is internal to q, there exists a sequential proof for ${\mathcal {K}}({q})\models {Z}\rightarrow {w}$ such that no atom in the proof attacks a variable in Z ∪{w}. Let $J_{1},J_{2},\dots ,J_{\ell }$ be a shortest such proof. Because $F\overset {q}{\rightsquigarrow }u^{\prime }$ and $u^{\prime } \in Z \cup \{w\}$, it must be that $F\not \in \{J_{1},\dots ,J_{\ell }\}$. We can assume that w occurs at a non-primary-key position in J_ℓ. Because of (6), we can assume the existence of a variable v ∈key(J_ℓ) such that ${\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {v}$. If v∉Z, then there exists k < ℓ such that v occurs at a non-primary-key position in J_k. Again, we can assume a variable $v^{\prime }\in {\mathsf {key}}({J_{k}})$ such that ${\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {v^{\prime }}$. By repeating the same reasoning, there exists a sequence

$$ \stackrel{z_{i_{0}}}{\smallfrown}J_{i_{0}} \stackrel{z_{i_{1}}}{\smallfrown}J_{i_{1}} \stackrel{z_{i_{2}}}{\smallfrown} {\dots} \stackrel{z_{i_{m}}}{\smallfrown}J_{i_{m}} \stackrel{w}{\smallfrown} $$

where $1\leq i_{0}<i_{1}<\dotsm <i_{m}=\ell $ such that

$z_{i_{0}}\in Z$;
for all $j\in \{0,\dots ,m\}$, ${\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {z_{i_{j}}}$; and
for all $j\in \{1,\dots ,m\}$, $z_{i_{j}}\in {\mathsf {vars}}({J_{i_{j-1}}})\cap {\mathsf {vars}}({J_{i_{j}}})$. In particular, $z_{i_{j}}\in {\mathsf {key}}({J_{i_{j}}})$.

We can assume G ∈ q such that $Z\subseteq {\mathsf {vars}}({G})$. Let $u\in \{u^{\prime },u^{\prime \prime }\}$ such that u≠w. Thus, $\{u,w\}=\{u^{\prime },u^{\prime \prime }\}$. It can now be easily seen that a witness for $F\stackrel {q^{\prime }}{\rightsquigarrow }H$ can be obtained by replacing $N^{\mathsf {c}}(\underline {\vec {z}},w)$ in (5) with the following sequence or its reverse:

$$ \stackrel{u}{\smallfrown}G \stackrel{z_{i_{0}}}{\smallfrown}J_{i_{0}} \stackrel{z_{i_{1}}}{\smallfrown}J_{i_{1}} \stackrel{z_{i_{2}}}{\smallfrown} {\dots} \stackrel{z_{i_{m}}}{\smallfrown}J_{i_{m}} \stackrel{w}{\smallfrown} $$

This concludes the proof of Lemma 17. □

The proof of Lemma 11 is now straightforward.

Proof Proof of Lemma 11

Repeated application of Lemma 17. □

1.2 D.2 Proof of Lemma 12

We will use the following helping lemma.

Lemma 18

Let q be a query in sjfBCQ such that q is saturated and the attack graph of q contains no strong cycle. Let $\mathcal {S}$ be an initial strong component in the attack graph of q with $\left |{\mathcal {S}}\right |\geq 2$. For every atom $F \in \mathcal {S}$, there exists an atom $H \in \mathcal {S}$ such that F→ _MH.

Proof

Assume $F \in \mathcal {S}$. Since F belongs to an initial strong component with at least two atoms, there exists $G \in \mathcal {S}$ such that $F\overset {q}{\rightsquigarrow }G$ and the attack is weak. Therefore, ${\mathcal {K}}({q})\models {{\mathsf {key}}({F})}\rightarrow {{\mathsf {key}}({G})}$. It follows that ${\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}$. Let $\sigma = H_{1}, H_{2}, \dots , H_{\ell }$ be a sequential proof for ${\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}$, where $F \notin \{H_{1}, \dots , H_{\ell }\}$. We can assume without loss of generality that H_ℓ = G.

Let j be the smallest index in $\{1, \dots , \ell \}$ such that $H_{j} \in \mathcal {S}$. Since $H_{\ell } \in \mathcal {S}$, such an index always exists. Then, $\sigma = H_{1}, H_{2}, \dots , H_{j-1}$ is a sequential proof for ${\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({H_{j}})}$ (observe that this proof may be empty). By our choice of j, for every $i\in \{1,\dots ,j-1\}$, we have $H_{i} \notin \mathcal {S}$, and hence H_i cannot attack F or H_j (since $\mathcal {S}$ is an initial strong component). It follows that no atom in σ attacks a variable in vars(F) ∪key(H_j). Since q is saturated, this implies that ${\mathcal {K}}({{q}^{\mathsf {cons}}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({H_{j}})}$, and so F→ _MH_j. □

The proof of Lemma 12 can now be given.

Proof Proof of Lemma 12

Starting from some atom $F_{0} \in \mathcal {S}$, by applying repeatedly Lemma 18, we can create an infinite sequence $F_{0} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F_{1} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F_{2} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } \dotsm $ such that for every i ≥ 1, $F_{i} \in \mathcal {S}$ and F_i≠F_i+ 1. Since the atoms in $\mathcal {S}$ are finitely many, there will exist some i,j such that i < j and F_i = F_j+ 1. It follows that the M-graph of q contains a cycle all of whose atoms belong to $\mathcal {S}$. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koutris, P., Wijsen, J. Consistent Query Answering for Primary Keys in Datalog. Theory Comput Syst 65, 122–178 (2021). https://doi.org/10.1007/s00224-020-09985-6

Download citation

Published: 30 June 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00224-020-09985-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Consistent Query Answering for Primary Keys in Datalog

Abstract

Access this article

Similar content being viewed by others

A Survey of the Data Complexity of Consistent Query Answering under Key Constraints

On the Data Complexity of Consistent Query Answering

Automated Reasoning About Key Sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

E : Proofs of Section 9

Lemma 19

Proof

Proof Proof of Theorem 4

Publisher’s Note

Appendices

Appendix A: Overview of Different Graphs and Notations

Appendix B: Proofs of Section 5

1.1 B.1 Proofs of Lemmas 1 and 2

Proof Proof of Lemma 1

Proof Proof of Lemma 2

1.2 B.2 Proof of Lemma 3

Lemma 13

Proof

Corollary 1

Proof

Lemma 14

Proof

Corollary 2

Proof

Proof Proof of Lemma 3

Appendix C: Appendix to Section 7

1.1 C.1 Proofs of Lemmas 5 and 6

Proof Proof of Lemma 5

Proof Proof of Lemma 6

1.2 C.2 Proof of Lemma 8

Lemma 15

Proof

Proof Proof of Lemma 8

1.3 C.3 Illustration of the Datalog Program in the Proof of Lemma 9

Example 5

1.4 C.4 Proof of Lemma 10

Proof Proof of Lemma 10

Appendix D: Proofs of Section 8.1

1.1 D.1 Proof of Lemma 11

Lemma 16

Lemma 17

Proof Proof of the first item

Proof Proof of Lemma 11

1.2 D.2 Proof of Lemma 12

Lemma 18

Proof

Proof Proof of Lemma 12

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation