Logical reduction of metarules

Cropper, Andrew; Tourret, Sophie

doi:10.1007/s10994-019-05834-x

Logical reduction of metarules

Open access
Published: 20 November 2019

Volume 109, pages 1323–1369, (2020)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Logical reduction of metarules

Download PDF

4194 Accesses
8 Citations
Explore all metrics

Abstract

Many forms of inductive logic programming (ILP) use metarules, second-order Horn clauses, to define the structure of learnable programs and thus the hypothesis space. Deciding which metarules to use for a given learning task is a major open problem and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules, so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. In this paper, we study whether fragments of metarules can be logically reduced to minimal finite subsets. We consider two traditional forms of logical reduction: subsumption and entailment. We also consider a new reduction technique called derivation reduction, which is based on SLD-resolution. We compute reduced sets of metarules for fragments relevant to ILP and theoretically show whether these reduced sets are reductions for more general infinite fragments. We experimentally compare learning with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules. In general, derivation reduced sets of metarules outperform subsumption and entailment reduced sets, both in terms of predictive accuracies and learning times.

Derivation Reduction of Metarules in Meta-interpretive Learning

Logical Minimisation of Meta-Rules Within Meta-Interpretive Learning

Goal-Oriented Proof-Search in Natural Deduction for Intuitionistic Propositional Logic

Article 07 September 2017

Mauro Ferrari & Camillo Fiorentini

1 Introduction

Many forms of inductive logic programming (ILP) (Albarghouthi et al. 2017; Campero et al. 2018; Cropper and Muggleton 2019; Emde et al. 1983; Evans and Grefenstette 2018; Flener 1996; Kaminski et al. 2018; Kietz and Wrobel 1992; Muggleton et al. 2015; De Raedt and Bruynooghe 1992; Si et al. 2018; Wang et al. 2014) use second-order Horn clauses, called metarules.^{Footnote 1} as a form of declarative bias (De Raedt 2012). Metarules define the structure of learnable programs which in turn defines the hypothesis space. For instance, to learn the grandparent/2 relation given the parent/2 relation, the chain metarule would be suitable:

$$\begin{aligned} P(A,B) \leftarrow Q(A,C), R(C,B) \end{aligned}$$

In this metarule^{Footnote 2} the letters P, Q, and R denote existentially quantified second-order variables (variables that can be bound to predicate symbols) and the letters A, B and C denote universally quantified first-order variables (variables that can be bound to constant symbols). Given the chain metarule, the background parent/2 relation, and examples of the grandparent/2 relation, ILP approaches will try to find suitable substitutions for the existentially quantified second-order variables, such as the substitutions {P/grandparent, Q/parent, R/parent} to induce the theory:

$$\begin{aligned} grandparent (A,B) \leftarrow parent (A,C), parent (C,B) \end{aligned}$$

However, despite the widespread use of metarules, there is little work determining which metarules to use for a given learning task. Instead, suitable metarules are assumed to be given as part of the background knowledge, and are often used without any theoretical justification. Deciding which metarules to use for a given learning task is a major open challenge (Cropper 2017; Cropper and Muggleton 2014) and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules (Cropper and Muggleton 2014; Lin et al. 2014), so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. For instance, it is impossible to learn the grandparent/2 relation using only metarules with monadic predicates.

In this paper, we study whether potentially infinite fragments of metarules can be logically reduced to minimal, or irreducible, finite subsets, where a fragment is a syntactically restricted subset of a logical theory (Bradley and Manna 2007).

Cropper and Muggleton (2014) first studied this problem. They used Progol’s entailment reduction algorithm (Muggleton 1995) to identify entailment reduced sets of metarules, where a clause C is entailment redundant in a clausal theory $T \cup \{C\}$ when $T \models C$. To illustrate entailment redundancy, consider the following first-order clausal theory $T_1$, where p, q, r, and s are first-order predicates:

$$\begin{aligned} \begin{aligned} C_1&= p(A,B) \leftarrow q(A,B) \\ C_2&= p(A,B) \leftarrow q(A,B),r(A) \\ C_3&= p(A,B) \leftarrow q(A,B),r(A),s(B,C) \end{aligned} \end{aligned}$$

In $T_1$ the clauses $C_2$ and $C_3$ are entailment redundant because they are both logical consequences of $C_1$, i.e. $\{C_1\} \models \{C_2,C_3\}$. Because $\{C_1\}$ cannot be reduced, it is a minimal entailment reduction of $T_1$.

Cropper and Muggleton showed that in some cases as few as two metarules are sufficient to entail an infinite fragment of chained^{Footnote 3} second-order dyadic Datalog (Cropper and Muggleton 2014). They also showed that learning with minimal sets of metarules improves predictive accuracies and reduces learning times compared to non-minimal sets. To illustrate how a finite subset of metarules could entail an infinite set, consider the set of metarules with only monadic literals and a single first-order variable A:

$$\begin{aligned} \begin{aligned}&M_1 = P(A) \leftarrow T_1(A)\\&M_2 = P(A) \leftarrow T_1(A),T_2(A)\\&M_3 = P(A) \leftarrow T_1(A),T_2(A),T_3(A)\\&\dots \\&M_{n} = P(A) \leftarrow T_1(A),T_2(A),\dots ,T_{n}(A)\\&\dots \\ \end{aligned} \end{aligned}$$

Although this set is infinite it can be entailment reduced to the single metarule $M_1$ because it implies the rest of the theory.

However, in this paper, we claim that entailment reduction is not always the most appropriate form of reduction. For instance, suppose you want to learn the father/2 relation given the background relations parent/2, male/1, and female/1. Then a suitable hypothesis is:

$$\begin{aligned} father (A,B) \leftarrow parent (A,B), male (A) \end{aligned}$$

To learn such a hypothesis one would need a metarule of the form $P(A,B) \leftarrow Q(A,B),R(A)$. Now suppose you have the metarules:

$$\begin{aligned} \begin{aligned} M_1&= P(A,B) \leftarrow Q(A,B) \\ M_2&= P(A,B) \leftarrow Q(A,B),R(A) \end{aligned} \end{aligned}$$

Running entailment reduction on these metarules would remove $M_2$ because it is a logical consequence of $M_1$. However, it is impossible to learn the intended father/2 relation given only $M_1$. As this example shows, entailment reduction can be too strong because it can remove metarules necessary to specialise a clause, where $M_2$ can be seen as a specialisation of $M_1$.

To address this issue, we introduce derivation reduction, a new form of reduction based on derivations, which we claim is a more suitable form of reduction for reducing sets of metarules. Let $\vdash $ represent derivability in SLD-resolution^{Footnote 4} (Kowalski 1974), then a Horn clause C is derivationally redundant in a Horn theory $T \cup \{C\}$ when $T \vdash C$. A Horn theory is derivationally irreducible if it contains no derivationally redundant clauses. To illustrate the difference between entailment and derivation reduction, consider the metarules:

$$\begin{aligned} \begin{aligned} M_1&= P(A,B) \leftarrow Q(A,B)\\ M_2&= P(A,B) \leftarrow Q(A,B),R(A)\\ M_3&= P(A,B) \leftarrow Q(A,B),R(A,B)\\ M_4&= P(A,B) \leftarrow Q(A,B),R(A,B),S(A,B) \end{aligned} \end{aligned}$$

Running entailment reduction on these metarules would result in the reduction $\{M_1\}$ because $M_1$ entails the rest of the theory. Likewise, running subsumption reduction (Plotkin 1971) (described in detail in Sect. 3.5) would also result in the reduction $\{M_1\}$. By contrast, running derivation reduction would only remove $M_4$ because it can be derived by self-resolving $M_3$. The remaining metarules $M_2$ and $M_3$ are not derivationally redundant because there is no way to derive them from the other metarules.

1.1 Contributions

In the rest of this paper, we study whether fragments of metarules relevant to ILP can be logically reduced to minimal finite subsets. We study three forms of reduction: subsumption (Robinson 1965), entailment (Muggleton 1995), and derivation (Cropper and Tourret 2018). We also study how learning with reduced sets of metarules affects learning performance. To do so, we supply Metagol (Cropper and Muggleton 2016b), a meta-interpretive learning (MIL) (Cropper and Muggleton 2016a; Muggleton et al. 2014, 2015) implementation, with different reduced sets of metarules and measure the resulting learning performance on three domains: Michalski trains (Larson and Michalski 1977), string transformations, and game rules (Cropper et al. 2019). In general, using derivation reduced sets of metarules outperforms using subsumption and entailment reduced sets, both in terms of predictive accuracies and learning times. Overall, our specific contributions are:

We describe the logical reduction problem (Sect. 3).
We describe subsumption and entailment reduction, and introduce derivation reduction, the problem of removing derivationally redundant clauses from a clausal theory (Sect. 3).
We study the decidability of the three reduction problems and show, for instance, that the derivation reduction problem is undecidable for arbitrary Horn theories (Sect. 3).
We introduce two general reduction algorithms that take a reduction relation as a parameter. We also study their complexity (Sect. 4).
We run the reduction algorithms on finite sets of metarules to identify minimal sets (Sect. 5).
We theoretically show whether infinite fragments of metarules can be logically reduced to finite sets (Sect. 5).
We experimentally compare the learning performance of Metagol when supplied with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules (Sect. 6).

2 Related work

This section describes work related to this paper, mostly work on logical reduction techniques. We first, however, describe work related to MIL and metarules.

2.1 Meta-interpretive learning

Although the study of metarules has implications for many ILP approaches (Albarghouthi et al. 2017; Campero et al. 2018; Cropper and Muggleton 2019; Emde et al. 1983; Evans and Grefenstette 2018; Flener 1996; Kaminski et al. 2018; Kietz and Wrobel 1992; Muggleton et al. 2015; De Raedt and Bruynooghe 1992; Si et al. 2018; Wang et al. 2014), we focus on meta-interpretive learning (MIL), a form of ILP based on a Prolog meta-interpreter.^{Footnote 5} The key difference between a MIL learner and a standard Prolog meta-interpreter is that whereas a standard Prolog meta-interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given goal, a MIL learner additionally attempts to prove a goal by fetching second-order metarules, supplied as background knowledge (BK), whose heads unify with the goal. The resulting meta-substitutions are saved and can be reused in later proofs. Following the proof of a set of goals, a logic program is formed by projecting the meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate invention and learning recursive theories.

Most existing work on MIL has assumed suitable metarules as input to the problem, or has used metarules without any theoretical justification. In this paper, we try to address this issue by identifying minimal sets of metarules for interesting fragments of logic, such as Datalog, from which a MIL system can theoretically learn any logic program.

2.2 Metarules

McCarthy (1995) and Lloyd (2003) advocated using second-order logic to represent knowledge. Similarly, Muggleton et al. (2012) argued that using second-order representations in ILP provides more flexible ways of representing BK compared to existing methods. Metarules are second-order Horn clauses and are used as a form of declarative bias (Nédellec et al. 1996; De Raedt 2012) to determine the structure of learnable programs which in turn defines the hypothesis space. In contrast to other forms of declarative bias, such as modes (Muggleton 1995) or grammars (Cohen 1994), metarules are logical statements that can be reasoned about, such as to reason about the redundancy of sets of metarules, which we explore in this paper.

Metarules were introduced in the Blip system (Emde et al. 1983). Kietz and Wrobel (1992) studied generality measures for metarules in the RDT system. A generality order is necessary because the RDT system searches the hypothesis space (which is defined by the metarules) in a top-down general-to-specific order. A key difference between RDT and MIL is that whereas RDT requires metarules of increasing complexity (e.g. rules with an increasing number of literals in the body), MIL derives more complex metarules through SLD-resolution. This point is important because this ability allows MIL to start from smaller sets of primitive metarules. In this paper we try to identify such primitive sets.

Using metarules to build a logic program is similar to the use of refinement operators in ILP (Nienhuys-Cheng and de Wolf 1997; Shapiro 1983) to build a definite clause literal-by-literal.^{Footnote 6} As with refinement operators, it seems reasonable to ask about completeness and irredundancy of a set of metarules, which we explore in this paper.

2.3 Logical redundancy

Detecting and eliminating redundancy in a clausal theory is useful in many areas of computer science. In ILP logically reducing a theory is useful to remove redundancy from a hypothesis space to improve learning performance (Cropper and Muggleton 2014; Fonseca et al. 2004). In general, simplifying or reducing a theory often makes a theory easier to understand and use, and may also have computational efficiency advantages.

2.3.1 Literal redundancy

Plotkin (1971) used subsumption to decide whether a literal is redundant in a first-order clause. Joyner (1976) independently investigated the same problem, which he called clause condensation, where a condensation of a clause C is a minimum cardinality subset $C'$ of C such that $C' \models C$. Gottlob and Fermüller (1993) improved Joyner’s algorithm and also showed that determining whether a clause is condensed is co-NP-complete. In contrast to removing redundant literals, we focus on removing redundant clauses.

2.3.2 Clause redundancy

Plotkin (1971) introduced methods to decide whether a clause is subsumption redundant in a first-order clausal theory. This problem has also been extensively studied in the context of first-order logic with equality due to its application in superposition-based theorem proving (Hillenbrand et al. 2013; Weidenbach and Wischnewski 2010). The same problem, and slight variants, has been extensively studied in the propositional case (Liberatore 2005, 2008). Removing redundant clauses has numerous applications, such as to improve the efficiency of SAT (Heule et al. 2015). In contrast to these works, we focus on reducing theories formed of second-order Horn clauses (without equality), which to our knowledge has not yet been extensively explored. Another difference is that we additionally study redundancy based on SLD-derivations.

Cropper and Muggleton (2014) used Progol’s entailment-reduction algorithm (Muggleton 1995) to identify irreducible sets of metarules. Their approach removed entailment redundant clauses from sets of metarules. They identified theories that are (1) entailment complete for certain fragments of second-order Horn logic, and (2) irreducible. They demonstrated that in some cases as few as two clauses are sufficient to entail an infinite theory. However, they only considered small and highly constrained fragments of metarules. In particular, they focused on an exactly-two-connected fragment of metarules where each literal is dyadic and each first-order variable appears exactly twice in distinct literals. However, as discussed in the introduction, entailment reduction is not always the most appropriate form of reduction because it can remove metarules necessary to specialise a clause. Therefore, in this paper, we go beyond entailment reduction and introduce derivation reduction. We also consider more general fragments of metarules, such as a fragment of metarules sufficient to learn Datalog programs.

Cropper and Tourret (2018) introduced the derivation reduction problem and studied whether sets of metarules could be derivationally reduced. They considered the exactly-two-connected fragment previously considered by Cropper and Muggleton and a two-connected fragment in which every variable appears at least twice, which is analogous to our singleton-free fragment (Sect. 5.3). They used graph theoretic methods to show that certain fragments could not be completely derivationally reduced. They demonstrated on the Michalski trains dataset that the partially derivationally reduced set of metarules outperforms the entailment reduced set. In similar work Cropper and Tourret elaborated on their graph theoretic techniques and expanded the results to unconstrained resolution (Tourret and Cropper 2019).

In this paper, we go beyond the work of (Cropper and Tourret 2018) in several ways. First, we consider more general fragments of metarules, including connected and Datalog fragments. We additionally consider fragments with zero arity literals. In all cases we provide additional theoretical results showing whether certain fragments can be reduced, and, where possible, show the actual reductions. Second, Tourret and Cropper (2019) focused on derivation reduction modulo first-order variable unification, i.e. they considered the case where factorisation (Nienhuys-Cheng and de Wolf 1997) was allowed when resolving two clauses, which is not implemented in practice in current MIL systems. For this reason, although Section 5 in Tourret and Cropper (2019) and Sect. 5.1 in the present paper seemingly consider the same problem, the results are opposite to one another. Third, in addition to entailment and derivation reduction, we also consider subsumption reduction. We provide more theoretical results on the decidability of the reduction problems, such as showing a decidable case for derivation reduction (Theorem 4). Fourth, we describe the reduction algorithms and discuss their computational complexity. Finally, we corroborate the experimental results of Cropper and Tourret on Michalski’s train problem (Cropper and Tourret 2018) and provide additional experimental results on two more domains: real-world string transformations and inducing Datalog game rules from observations.

2.3.3 Theory minimisation

We focus on removing clauses from a clausal theory. A related yet distinct topic is theory minimisation where the goal is to find a minimum equivalent formula to a given input formula. This topic is often studied in propositional logic (Hemaspaandra and Schnoor 2011). The minimisation problem allows for the introduction of new clauses. By contrast, the reduction problem studied in this paper does not allow for the introduction of new clauses and instead only allows for the removal of redundant clauses.

2.3.4 Prime implicates

Implicates of a theory T are the clauses that are entailed by T and are called prime when they do not themselves entail other implicates of T. This notion differs from the subsumption and derivation reduction because it focuses on entailment, and it differs from entailment reduction because (1) the notion of a prime implicate has been studied only in propositional, first-order, and some modal logics (Bienvenu 2007; Echenim et al. 2015; Marquis 2000); (2) the generation of prime implicates allows for the introduction of new clauses in the formula.

3 Logical reduction

We now introduce the reduction problem: the problem of finding redundant clauses in a theory. We first describe the reduction problem starting with preliminaries, and then describe three instances of the problem. The first two instances are based on existing logical reduction methods: subsumption and entailment. The third instance is a new form of reduction introduced in Cropper and Tourret (2018) based on SLD-derivations.

3.1 Preliminaries

We assume familiarity with logic programming notation (Lloyd 1987) but we restate some key terminology. A clause is a disjunction of literals. A clausal theory is a set of clauses. A Horn clause is a clause with at most one positive literal. A Horn theory is a set of Horn clauses. A definite clause is a Horn clause with exactly one positive literal. A Horn clause is a Datalog clause if (1) it contains no function symbols, and (2) every variable that appears in the head of the clause also appears in a positive (i.e. not negated) literal in the body of the clause.^{Footnote 7} We denote the powerset of the set S as $2^S$.

3.1.1 Metarules

Although the reduction problem applies to any clausal theory, we focus on theories formed of metarules:

Definition 1

(Metarule) A metarule is a second-order Horn clause of the form:

$$\begin{aligned} A_0 \leftarrow A_1, \; \dots \;, \; \; A_m \end{aligned}$$

where each $A_i$ is a literal of the form $P(T_1,\dots ,T_n )$ where P is either a predicate symbol or a second-order variable that can be substituted by a predicate symbol, and each $T_i$ is either a constant symbol or a first-order variable that can be substituted by a constant symbol.

Table 1 Example metarules

Logical reduction of metarules

Abstract

Similar content being viewed by others

Derivation Reduction of Metarules in Meta-interpretive Learning

Logical Minimisation of Meta-Rules Within Meta-Interpretive Learning

Goal-Oriented Proof-Search in Natural Deduction for Intuitionistic Propositional Logic

1 Introduction

1.1 Contributions

2 Related work

2.1 Meta-interpretive learning

2.2 Metarules

2.3 Logical redundancy

2.3.1 Literal redundancy

2.3.2 Clause redundancy

2.3.3 Theory minimisation

2.3.4 Prime implicates

3 Logical reduction

3.1 Preliminaries

3.1.1 Metarules

Definition 1

Definition 2

Example 1

Example 2

Example 3

Example 4

3.2 Meta-interpretive learning

Definition 3

Definition 4

Theorem 1

Proof

3.3 Encapsulation

Definition 5

Definition 6

Definition 7

Proposition 1

Proof

Proposition 2

Proof

3.4 Logical reduction problem

Definition 8

Definition 9

Definition 10

Definition 11

Definition 12

3.5 Subsumption reduction

Definition 13

Proposition 3

Proof

3.6 Entailment reduction

Definition 14

Proposition 4

Proof

Proposition 5

Proof

Proposition 6

Proof

Proposition 7

Proof

3.7 Derivation reduction

Definition 15

Definition 16

Definition 17

Definition 18

Theorem 2

Proposition 8

Proof

Theorem 3

Proof

Lemma 1

Proof

Lemma 2

Proof

Theorem 4

Proof

3.8 k-Derivable clauses

Definition 19

4 Reduction algorithms

4.1 \(\sqsubset \)-Reduction algorithm

Proposition 9

Proof