1 Introduction

There exists overwhelming empirical evidence that a lot of indicative conditionals are appropriate only if the corresponding conditional probability is high. It is also clear, however, that just demanding a high conditional probability is not enough for appropriateness of many such conditionals; a dependence relation is required as well. In some recent work, Douven (2008, 2016) and Skovgaard-Olsen et al. (2016) proposed that for appropriateness of these indicative conditionals two independent requirements have to be met: not only should the conditional probability be high, but there should also be a dependence relation. In this paper we propose, instead, that a single condition will do: the relation between antecedent and consequent is a causal one.

In Sect. 2 we propose to make use of a notion of ‘relative difference’ to account for the required dependency relation between antecedents and consequents of (many) indicative conditionals. In Sect. 3 we show how this notion can be derived from ‘deeper’ causal assumptions, making use of causal powers. In Sects. 4 and 5 we show how under natural conditions our causal analysis can explain why many indicative conditionals demand a high conditional probability in order to be appropriate. First, we show this for causal ‘forward’ conditionals (Sect. 4), followed by an explanation for causal ‘backward’ diagnostic conditionals (Sect. 5). Section 6 concludes the paper.

2 A Dependence Requirement for Conditionals

To what degree would you believe the following sentences, given that a card has been picked at random from a standard 52 card deck?

  1. (1)
    1. a.

      The selected card is a king, if it’s red.

    2. b.

      It’s diamonds, if it’s black.

    3. c.

      It’s spades, if it is a nine.

The obvious answers are \(\frac{1}{13}, \frac{1}{2}\), and \(\frac{1}{4}\), respectively. This suggests that the belief in a conditional sentence ‘If i, then e’ should equal one’s conditional probability of the consequent, given the antecedent of the conditional, P(e|i). This cannot be accounted for without further ado by saying that the conditional belief is simply the probability of the material implication being true, \(P(i \rightarrow e)\), because in most circumstances \(P(i \rightarrow e)\) is strictly higher than P(e|i). Perhaps there is another conditional connective \(\Rightarrow \) for which it holds that \(P(i \Rightarrow e) = P(e|i)\). The idea that we should interpret conditionals in terms of conditional probabilities had for related reasons already been proposed by Adams (1965). The further hypothesis that there should be a binary connective ‘\(\Rightarrow \)’ such that \(i \Rightarrow e\) expresses a proposition with the result that \(P(i \Rightarrow e) = P(e|i)\) was explicitly made by Stalnaker (1970). Unfortunately, Lewis (1976) famous triviality result shows that Stalnaker’s hypothesis cannot be upheld together with some other—seemingly natural—assumptions.Footnote 1

One natural way to solve the above problems would be to say that conditionals of the form ‘If i, then e’ simply don’t express propositions. They are assertable not because the speaker believes the proposition expressed by the sentence with high probability, but simply because of the speaker’s conditional probability, his or her conditional belief, of e given i, P(e|i). On this proposal (defended by Adams (1965), Gibbard (1981), Edgington (1995), Bennett (2003) and others) a natural language indicative conditional of the form ‘If i, then e’ does not express a proposition, and its assertability, or appropriateness, depends on P(e|i).Footnote 2 This proposal is compatible with the view that indicative conditionals can be (taken to be) true or false in some situations. In particular, one can adopt de Finetti’s (1936/1995) proposal [as done by psychologists like Over and Evans (2003) and Kleiter et al. (2018), philosophers (Belnap 1970; Milne 2004) and linguists (Huitink 2008)], according to which ‘If i, then e’ is taken to be true if i and e hold, false, if i holds, but not e, and has no (classical) truth value otherwise. The probability P(e|i) then measures the conditional probability that the conditional is true, the ‘\(i \wedge e\)’ outcome, given that the conditional has a (classical) truth value at all, the i case, for \(P(i \wedge e|i) = P(e|i)\).Footnote 3

Sentences are only assertable if one believes them strongly enough. It seems thus natural to demand that one can only assert a conditional if one’s conditional probability of e given i is above a certain threshold. Let us say for concreteness that this threshold is 0.6. Thus, we assume that acceptability of a conditional goes by conditional probability and demand that for assertability the conditional probability should be higher than 0.6. Suppose, however, that \(P(e|i) = 0.7\)—and thus above the threshold—but that \(P(e|\lnot i) = 0.8\).Footnote 4 In that case it is predicted that the conditional is appropriate, or assertable, although the antecedent makes the consequent only less likely. This seems wrong. Of course, one could demand that this simply means that we should increase the required threshold for conditional probability. Unfortunately, for each such threshold below 1 a similar problem can be constructed. Only if it would be demanded that \(P(e|i) =1\) this problem could be avoided. But requiring absolute certainty for the appropriate use of the conditional just seems too demanding. Moreover, in case \(P(e) = 1\), and thus also \(P(e|\lnot i) = 1\), the conditional still seems inappropriate, even if it is considered true. The reason is that even this strong proposal still has a hard time explaining what is wrong with the following conditional

  1. (2)

    If it is sunny today, Jan Ullrich won the Tour de France in 1997,

especially given that Jan Ullrich won the Tour in 1997. There is general agreement as to what the problem is: there is no dependence in this case (e.g. Krzyzanowska et al. 2013). Of course, there are many indicative conditionals that are appropriate, although there is no dependence. In particular, concessive conditionals, biscuit conditionals, and even if conditionals. The use of then in these conditionals, however, doesn’t seem to be appropriate (cf. Iatridou 1994). So, we will limit our discussion in this paper to those kinds of indicative conditionals that can be reformulated with an explicit ‘then’ in the consequent, without (much) change of meaning. We propose that for (indicative) conditionals of that type, a dependency is required between antecedent and consequent for the sentence to be appropriate. Skovgaard-Olsen et al. (2017) argue that such an appropriateness condition is of a pragmatic rather than of a semantic nature.

Within classical learning-by-conditioning psychology, learning a dependency between two events e and i is measured in terms of the contingency\(\Delta P^ {e}_{i}\) of one event on the other: \(\Delta P^ {e}_{i} = P(e|i) - P(e|\lnot i)\), where P measures frequencies (e.g., Shanks 1995). Contingency does not simply measure whether the probability of e given i is high, but whether it is high compared to the probability of e given all other (contextually relevant) alternative cases than i (\(\lnot i\) abbreviates \(\bigcup Alt(i)\)). Thus, \(\Delta P^ {e}_{i}\) measures how representative or typical e is for i.

The obvious solution to solve our problem related to (2) would be to demand that one can appropriately assert a conditional of the form ‘If i, then e’ only if \(\Delta P^e_i > 0\) 0.Footnote 5 Consider the case where \(P(e|\lnot i) = 0.8\) and \(P(e|i) = 0.9\). We have suggested that in those cases it is acceptable to assert the conditional. But notice that now \(\Delta P^e_i = P(e|i) - P(e|\lnot i)\) is only 0.1. Yes, this is positive, but that by itself doesn’t seem enough. If it were enough, then the conditional would also be acceptable in case, say, \(P(e|\lnot i) = 0.1\) and \(P(e|i) = 0.2\), which seems absurd. To solve this problem, it is natural to demand for the acceptable use of a conditional that we require not only that (i) \(P(e|i) > P(e|\lnot i)\), but also that (ii) P(e|i) is higher than, say, 0.6. In fact, this is basically the proposal of Douven (2008, 2016). Unfortunately, even satisfaction of those two conditions doesn’t seem to be quite enough of a requirement either. Suppose, for instance, that \(P(e|i) = 0.61\) and \(P(e|\lnot i) = 0.6\). Although both conditions are now satisfied, we doubt whether this would be enough for the conditional to be assertable. This suggests that either the difference between P(e|i) and \(P(e|\lnot i)\) should be larger, or that the value of P(e|i) should be higher. Suppose, for instance, that we demand that \(P(e|i) - P(e|\lnot i)\) should be at least 0.2. But that would be too strong a demand: it would rule out many acceptable conditionals. For instance, if \(P(e|i) = 0.95\) and \(P(e|\lnot i) = 0.9\), the conditional seems good, but is not predicted to be so. We fear that any absolute threshold on \(\Delta P^e_i\) will have a similar problem. Perhaps, then, we should demand that P(e|i) should be higher. However, it seems that for any specific demand for P(e|i) a small enough difference between P(e|i) and \(P(e|\lnot i)\) can be found that would give rise to a similar problem as the one above.

This suggests that the demanded difference between P(e|i) and \(P(e|\lnot i)\) should depend on the conditional probability P(e|i): the higher P(e|i), the smaller the difference between P(e|i) and \(P(e|\lnot i)\) is required to be for the conditional still to be acceptable. One way to account for this is to demand that \([P(e|i) - P(e|\lnot i)] \times P(e|i)\) should be high. In this way, however, the value of P(e|i) does not count for more than the value of \(P(e|\lnot i)\), a condition that was found in experimental work by Skovgaard-Olsen et al. (2016), among many others. To assure that P(e|i) will count for more, we could demand that the conditional is appropriate only if \(\alpha P(e|i) > \beta P(e|\lnot i)\), with \(\alpha > \beta \). Unfortunately, for any value of \(\alpha \) and \(\beta \) it can now be the case that this condition holds although \(P(e|i) < P(e|\lnot i)\), which is undesirable. As it turns out, there exists a standard way to guarantee the satisfaction of both desirable features: (i) \(P(e|i) > P(e|\lnot i)\) and (ii) P(e|i) counts for more than \(P(e|\lnot i)\). This is by making use of the measure \(\Delta ^* P^e_i\) that epidemiologists (Shep 1958) call the ‘relative difference’, and to demand the following:Footnote 6

$$\begin{aligned} \begin{array}{l} { (CON)} \ \ \hbox {`If }i,\hbox { then }e\hbox {' is assertable}\quad \hbox { iff }\quad \Delta ^* P^e_i \hbox { is high}^7\\ \\ \qquad \qquad \qquad \qquad \qquad \hbox {with}\\ \\ \Delta ^* P^e_i = _{\,df} \frac{P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)} = \frac{\Delta P^{i}_{e}}{1 - P(e|\lnot i)} . \end{array} \end{aligned}$$

Footnote 7

\(\Delta ^* P^e_i\) is defined in terms of one familiar notion of relevance, namely contingency. But one can prove that \(\Delta ^* P^e_i\) is closely related to the more standard notion of relevance, i.e., \(P(e|i) - P(e)\), as well, because it can equivalently be formulated as follows:Footnote 8

$$\begin{aligned} \Delta ^* P^e_i = \frac{P(e|i) - P(e)}{P(\lnot i \wedge \lnot e)}. \end{aligned}$$

Obviously, for a conditional to be appropriate it is now demanded that \(\Delta P^e_i = P(e|i) - P(e|\lnot i) > 0\), or equivalently \(P(e|i) - P(e) > 0\). In this way we can account for the required dependency between antecedent and consequent. But our measure \(\Delta ^* P^e_i\) also accounts for the following two intuitions:

  1. 1.

    The required difference between P(e|i) and \(P(e|\lnot i)\) (or P(e)) for the conditional to be acceptable should depend on the conditional probability \(P(e|\lnot i)\) (or P(e)): the higher \(P(e|\lnot i)\) (or P(e)), the smaller the difference between P(e|i) and \(P(e|\lnot i)\) (or P(e)) is required to be for the conditional to be assertable.

  2. 2.

    The value of P(e|i) counts for more than the value of \(P(e|\lnot i)\).

To see that the measure of relative difference guarantees that the first intuition is satisfied, fix the difference between P(e|i) and \(P(e|\lnot i)\) to x, i.e., \(P(e|i) - P(e|\lnot i) = x\), with \(x > 0\). Now \(\Delta ^* P^e_i\) gets higher in case \(P(e|\lnot i)\) increases. But this can only be the case if P(e|i) increases as well, because \(P(e|i) > P(e|\lnot i)\). Of course, the other way around works similarly: for any fixed positive value of \(\Delta P^e_i\), the value of \(\Delta ^* P^e_i\) increases with an increase of P(e|i). The reason is that with an increase of P(e|i) and with a fixed value of \(P(e|i) - P(e|\lnot i)\), the value of \(P(e|\lnot i)\) will also increase, and thus the value of \(\frac{\Delta P^{i}_{e}}{1 - P(e|\lnot i)}\) as well.

To see that the measure of relative difference guarantees that the value of P(e|i) counts for more than the value of \(P(e|\lnot i)\), let us compare the difference between P(e|i) going up 0.1 for a fixed value of \(P(e|\lnot i)\) with the difference between \(P(e|\lnot i)\) going down 0.1 for a fixed value of P(e|i). An illustration will suffice.Footnote 9 Let us first fix the mid-point between P(e|i) and \(P(e|\lnot i)\) at 0.9. Then an increase of P(e|i) from 0.95 to 1 (keeping \(P(e|\lnot i)\) fixed at 0.85) results in an increase of our measure from 0.67 to 1, an increase of 0.33 points. But a decrease of the same amount of \(P(e|\lnot i)\) from 0.85 to 0.80 (and keeping P(e|i) thus fixed at 0.95) would result in an increase for our measure from 0.67 to only 0.75, an increase of less than 0.1 points.

To get some feeling for the workings of the measure \(\Delta ^* P^e_i\), we can compare it with the more straightforward measure \([P(e|i) - P(e|\lnot i)] \times P(e|i)\). One can easily see that for any fixed value of \(P(e|i) - P(e|\lnot i)\), the height of P(e|i) has a larger effect on \(\Delta ^* P^e_i\) than it has on the straightforward measure. Indeed, in case \(P(e|i) - P(e|\lnot i) = 0.1\), for instance, an increase of P(e|i) from, say 0.9 to 1, would have only a marginal effect of \(0.1 - 0.09 = 0.01\) on the straightforward proposal. On proposal \(\Delta ^* P^e_i\), however, this increase would have the effect that the value goes up from 0.5 till 1. Given that 1 is the maximal value \(\frac{P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)}\) could get, this increase can thus be seen as a major impact. In fact, one can easily show that if \(\Delta P^e_i > 0\), then \(\Delta ^* P^e_i = 1\) iff \(P(e|i) = 1\).

Above, we have argued that Douven’s (2008, 2016) proposal that acceptability of conditionals of the form ‘If i, then e’ requires that \(\Delta P^e_i > 0\) and \(P(e|i) \ge \alpha \) for some fixed value \(\alpha \ge \frac{1}{2}\),Footnote 10 is too weak. Still, we do think that something like this should be required. We can guarantee this to hold simply by demanding that \(\Delta ^* P^e_i \ge \alpha \). In case \(P(e|\lnot i) = 0\), this has the same effects as Douven’s requirement. With an increase of \(P(e|\lnot i) = 0\), and thus of P(e|i), however, \(\Delta P^e_i\) is allowed to be smaller on our proposal for the conditional still to be acceptable. And as argued above, this seems to be in accord with our intuitions: a value of, say, \(\Delta P^e_i = 0.05\) is insignificant if \(P(e|\lnot i) \approx \frac{1}{2}\), but counts for a lot when \(P(e|\lnot i)\) comes close to 1.

3 Causality and Conditionals

In the previous section we have argued that conditionals of the form ‘If e, then i’ are assertable only if \(\Delta ^* P^e_i = \frac{\Delta P^{i}_{e}}{1 - P(e|\lnot i)}\) is high.Footnote 11 Still, various experiments [as reported, for instance, by Over and Evans (2003), Evans and Over (2004), Oaksford and Chater (2007), Douven and Verbrugge (2010), and Skovgaard-Olsen et al. (2016)] suggest that assertability correlates pretty well with conditional probability, especially the diagnostic conditionals. How can we square our proposal with those observations? Before we will be able to answer this question, we will first show how to derive the condition that \(\Delta ^* P^e_i \) has to be high, from assuming that we read the conditional in a causal way.

We have argued above that the assertability or acceptability of conditionals of the form ‘If i, then e’ goes by \(\Delta ^* P^e_i\). But in the end, this measures just something like correlation and is defined in terms of frequencies. However, we feel that a conditional is not assertable, or appropriate, just because of these correlations. The conditional is only acceptable if we can explain these correlations. A most natural explanation is a causal one: the correlation exists in virtue of a causal relation. Frequencies don’t have a direction, but a conditional sentence is stated in an asymmetric way: first antecedent, then consequent. This naturally gives rise to the expectation that events/states of type i correlate with events/states of type e because the former type of events cause the latter type of events. Indeed, Tversky and Kahneman (1980) have shown that this is our preferred way to interpret correlations.

To derive a causal interpretation of our measure \(\Delta ^* P^e_i\), we will assume with Cheng (1997) that objects, events or states of type i have unobservable causal powers to produce objects, events or states of type e. To assume that objects or events have causal powers has a long history. For Aristotle, and many medieval philosophers following him, potentiality plays a major role, and this can be thought of as causal power. Powers are assumed as well by philosophers like Leibniz and Kant.Footnote 12 More recently, the existence of causal powers was argued for by Harré and Madden (1975), Shoemaker (1980), Cartwright (1989),Footnote 13 Ellis (1999) and increasingly many others. Within probability theory this view accords well with the propensity interpretation of probabilities of Popper (1959) and Mellor (1974). According to causal power theorists, a typical causal claim such as ‘aspirin relieve headaches’ says that aspirins, in virtue of being aspirins, have the power, or capacity, to relieve headaches. This is very different from the claim that aspirin intake is regularly followed by a relief of headache. To make sense of causal powers, one has to assume that aspirins carry with them a ‘relatively enduring and stable capacity [...] from situation to situation; a capacity which may if circumstances are right reveal itself by producing a regularity’ (Cartwright 1989, p. 3). Notice that this requires that causal powers are independent of what actually is the case, or of actual probabilities. It is standardly assumed that Hume famously debunked any causal power theory. Strawson (1989) argued however—extremely controversial, of course—that even Hume never gave up the natural idea that causal power exists. It is just that their existence can never be observed or proved. One of the most interesting features of Cheng’s (1997) derivation is that under certain natural conditions causal powers can be traced back to observable frequencies after all, or so she argues.

Let us assume that events of type e are either due to events of type i, or due to alternative events. We will denote the (disjunction of the) set of alternative events by a. We will assume that i and a are probabilistically independent of one another, and thus that \(P(i \wedge a) = P(i) \times P(a)\). Our previous assumption that e could only be due to i or a means that without i or a, P(e) would be 0, i.e., \(P(e|\lnot i, \lnot a) = 0\). Given that \(\Delta ^* P^e_i\) is defined in terms of \(\Delta P^e_i\) and P(e|i), we would like to see what these turn out be under our assumptions.

We will denote the unobservable causal power of i to produce e by \(p_{ie}\). Although just like P(e|i), \(0 \le p_{ie} \le 1\), and can be thought of as the probability with which i produces e when i is present in the absence of any other potential cause of e, it is in general not reducible to \(\frac{P(i \wedge e)}{P(i)}\). This power is a local property of i, and thus very different from P(e|i), which is only a global property. To capture the locality assumption, we will assume that \(p_{ie}\)—unlike P(e|i)—is independent of P(i), meaning that the probability that i occurs and that it produces e is the same as \(P(i) \times p_{ie}\). We will denote by a the (disjunction of the) alternative causes of e, and by \(p_{ae}\) and P(e|a) the causal power of a to produce e and the conditional probability of e given a, respectively. With Cheng (1997) we are going to limit ourselves for the moment to causal structures like this: \(i \rightarrow e \leftarrow a\).Footnote 14 Thus, cases where e has two and only two causes, and where i and a are independent of each other, meaning that \(P(i \wedge a) = P(i) \times P(a)\). In addition, we will assume that \(p_{ie}\) is independent of \(p_{ae}\).

To determine \(p_{ie}\), let us remember that e could only be caused by i or a. To determine the probability of their disjunction, \(P(i \vee a)\), we make use of standard probability calculus: \(P(i \vee a) = P(i) + P(a) - P(i \wedge a)\). Now we can determine P(e):

$$\begin{aligned} P(e) = P(i) \times p_{ie} + P(a) \times p_{ae} - (P(i \wedge a) \times p_{ie} \times p_{ae}). \end{aligned}$$
(3)

From this we immediately derive \(p_{ie}\), the causal power of i to generate e . This is nothing else but the probability of e, conditional on i and \(\lnot a\):

$$\begin{aligned} p_{ie} = P(e|i, \lnot a). \end{aligned}$$
(4)

One problem with this notion is that it depends on a, and this is not always observable. Thus, it still remains mysterious how anyone could know, or reasonably estimate, the causal power of i to produce e. Fortunately, on our assumption that i and a are, or are believed to be, independent, we can make such an estimation after all. Assuming independence of i and a, P(e) becomes

$$\begin{aligned} P(e) = (P(i) \times p_{ie}) + (P(a) \times p_{ae}) - (P(i) \times P(a) \times p_{ie} \times p_{ae}) . \end{aligned}$$
(5)

As before, \(\Delta P^e_i\) is going to be defined in terms of conditional probabilities:

$$\begin{aligned} \Delta P^e_i = P(e|i) - P(e|\lnot i). \end{aligned}$$
(6)

The relevant conditional probabilities are now defined as follows

$$\begin{aligned} P(e|i)= & {} p_{ie} + (P(a|i) \times p_{ae}) - p_{ie} \times P(a|i) \times p_{ae}\nonumber \\ P(e|\lnot i)= & {} P(a|\lnot i) \times p_{ae}\qquad \qquad \hbox {(derived from} (5),\hbox { because }P(i|\lnot i) = 0) \end{aligned}$$
(7)

As a result, \(\Delta P^e_i\) comes down to

$$\begin{aligned} \Delta P^e_i= & {} p_{ie} + (P(a|i) \times p_{ae}) - (p_{ie} \times P(a|i) \times p_{ae}) - (P(a|\lnot i) \times p_{ae})\nonumber \\= & {} [p_{ie} - (p_{ie} \times P(a|i) \times p_{ae}] + [P(a|i) \times p_{ae} - P(a|\lnot i) \times p_{ae}\nonumber \\= & {} [1 - (P(a|i) \times p_{ae})] \times p_{ie} + [P(a|i) - P(a|\lnot i)] \times p_{ae}. \end{aligned}$$
(8)

From this last formula we can derive \(p_{ie}\) as follows:

$$\begin{aligned} p_{ie}= \frac{ \Delta P^e_i - [P(a|i) - P(a|\lnot i)] \times p_{ae}}{1 - P(a|i) \times p_{ae}}. \end{aligned}$$
(9)

One problem with (9) is that it still crucially depends on unobservable quantities: P(a|i) and the causal power of a to produce e, \(p_{ae}\). But on our independence assumptions one can determine \(p_{ie}\) in terms of observable frequencies. Notice that because of the assumed independences \(P(a|i) = P(a|\lnot i)\). As a result, (9) comes down to

$$\begin{aligned} p_{ie} = \frac{\Delta P^e_i}{1 - P(a|i) \times p_{ae}}. \end{aligned}$$
(10)

By independence again, it follows that \(P(a|i) \times p_{ae} = P(a) \times p_{ae} = P(e|\lnot i)\). The latter equality holds because \(P(a) \times p_{ae}\) is the probability that e occurs and is produced by a. Now, \(P(e|\lnot i)\) estimates \(P(a) \times p_{ae} \) because i occurs independently of a, and, in the absence of i, only a produces e. It follows that \(p_{ie}\) can be estimated in terms of observable frequencies as follows:

$$\begin{aligned} p_{ie} = \frac{\Delta P^e_i}{1 - P(e|\lnot i)}. \end{aligned}$$
(11)

But this is exactly the same as \(\Delta ^* P^e_i\), the measure in terms of which we have explained acceptability of conditionals in Sect. 2. Thus, in case we assume that a (generic) conditional of the form ‘If if, then e’ is appropriate because events of type i cause, or produce, events of type e, we derive exactly the same appropriateness condition as we have proposed in Sect. 2!

4 Causal Conditionals and Conditional Probability

In the previous section we have shown that on a causal reading of conditionals, it is natural to conclude that a conditional of the form ‘If i, then e’ is acceptable only if \(\Delta ^* P^e_i\) is high. Although we have argued in Sect. 2 that this is an appropriate prediction, it is still the case that various empirical studies show that many conditional sentences are accepted just in case the conditional probability of e given i, i.e., P(e|i) is high. This seems to show that, thus, a causal analysis of such conditionals is impossible. In this section we will argue that this is not the right conclusion: although under certain circumstances a causal reading of a conditional goes by high \(\Delta ^* P^e_i\), under other circumstances it rather goes by high P(e|i). How can that be, and what are these other circumstances?

Recall that in the previous section we assumed that e had two potential causes, i and a, and that these causes were independent of each other: \(P(i \wedge a) = P(i) \times P(a)\). In this section we will first show that if we give up this independence assumption in the most radical way, \(p_{ie}\) will turn out to be equal to P(e|i). Afterwards we will argue that it is actually quite natural to interpret conditionals such that these independence conditions are given up in these radical ways.

Let us first look at the extreme case where a and i are incompatible, and thus that \(P(i \wedge a) = 0\). The relevant conditional probabilities are then defined as follows from (3)

$$\begin{aligned} P(e) = P(i) \times p_{ie} + P(a) \times p_{ae}. \end{aligned}$$
(12)

Due to our assumption of incompatibility, i.e., \(P(a|i) = 0\), we can derive from this that \(P(e|i) = p_{ie}\). This value is larger than \(\Delta ^* P^e_i = \frac{P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)}\), if \(P(e|i) \not = 1\). It follows that on a causal reading of the conditional ’If i then e’ where one assumes that the alternative causes are incompatible, the interpretation is (only) going to be stronger: the value of \(\Delta ^* P^e_i\) at least has to be high.

Now look at the other extreme case: \(a = i\) and thus e has only one cause. In that case it immediately follows that \(P(e) = P(i) \times p_{ie}\), and thus that \(P(e|i) = p_{ie}\). The same result holds if we assume that a entails i . In that case P(e) is defined as follows:

$$\begin{aligned} P(e)= & {} P(i) \times p_{ie} + P(a) \times p_{ae} - P(a) \times p_{ae}\nonumber \\= & {} P(i) \times p_{ie} \end{aligned}$$
(13)

From this we derive

$$\begin{aligned} P(e|i)= & {} p_{ie} \nonumber \\ P(e|\lnot i)= & {} 0 \end{aligned}$$
(14)

and thus

$$\begin{aligned} \Delta ^* P^e_i = P(e|i). \end{aligned}$$
(15)

In contrast to our previous case, now there is no distinction between \(\Delta ^* P^e_i\) and \(p_{ie}\). In any case, we see that in case i and a are incompatible, or when i is the only potential cause of e, the causal power of i to produce e, \(p_{ie}\), is just the conditional probability P(e|i)!

Can this explain the experimental results which indicate that the acceptability of a conditional goes in many (though not all) cases with its conditional probability? In particular, can we explain the observation that the acceptability of an indicative conditional of the form ‘If i, then e’ correlates well with its conditional probability P(e|i) in case (i) the conditional is causal in nature (Over et al. 2007) and (ii) if \(\Delta P^e_i > 0\) (Skovgaard-Olsen et al. 2016)? We think it can to a large extent, because we do believe that (i) the fact that \(\Delta P^e_i > 0\) suggests (to people) that there exists a causal relation between i and e (Tversky and Kahneman 1980 found already that people prefer a causal interpretation of correlations), and (ii) for pragmatic reasons people generally assume—perhaps due to the assertion of the conditional—that there is only one cause of e, or that the alternative potential causes of the consequent, i.e., i and a, are incompatible with each other.

This could be due to various reasons for (ii). Most obviously, it could be that it is shared knowledge that e only has one potential cause, or that i and the alternative causes of e are simply incompatible. Then the result follows immediately. The more interesting case, however, is that the idea that there is only one cause could be due to a pragmatic effect. First, there is empirical evidence that when people’s attention is drawn to one possible cause, they tend to overlook the possible existence of alternative causes (cf. Koehler 1991; Brem and Rips 2000).Footnote 15 Thus, due to the assertion of ‘If i, then e’, the hearer will assume that i is the only cause of e, and thus \(p_{ie} = P(e|i)\). Second, it is well-known that indicative conditionals of the form ‘If i then e’ tend to be interpreted (via ‘conditional perfection’, Geis and Zwicky 1971) as also entailing ‘If \(\lnot i\), then \(\lnot e\)’. It is controversial what types of conditionals allow for this strengthening, but it is clear that ‘causal’ conditionals like ‘If you work hard, (then) you’ll succeed’ are prime examples. The mechanism behind this strengthened interpretation is controversial as well. On one hypothesis, the strengthening is due to a Gricean scalar implicature: if the speaker only asserts that ‘If i, then e’, by Gricean reasoning one can conclude that thus it is not the case that ‘If a, then e’, for any alternative a to i. Alternatively, the question under discussion is whether e is, or will, be the case, and the answer is ‘If i, then e’. The strengthened reading follows from this from standard exhaustive interpretation of the answer. Whatever the mechanism is, the result is the same: the use of the conditional gives rise to the expectation that i is the only reason why e could be the case. Thus, there is only one cause of e, and thus \(p_{ie} = P(e|i)\).

Now for incompatibility. If i and a are the alternative causes of e, they are also the alternative answers to the question ‘Why e?’ Although, in general, answers to a question might be incomplete and thus compatible with each other, in question semantics it is standardly assumed (cf. Groenendijk and Stokhof 1984) that the answers are complete answers, and complete answers are taken to be incompatible with each other. Thus, or so the reasoning goes, if it is known, or stated, that i and a are the alternative causes of e, it is assumed that i and a are the complete causal explanations of e, and thus are taken to be incompatible with each other. As a result, \(p_{ie} = P(e|i)\).Footnote 16

5 Diagnostic Conditionals and Conditional Probability

In the above sections we have discussed causal conditionals like

  1. (16)
    1. a.

      If John is nervous, he smokes.

    2. b.

      If fire, then smoke.

In terms of our causal structure \(i \rightarrow e \leftarrow a\) they are of the form ‘If i, then e’. But many natural language conditionals are stated in the reverse form ‘If e, then i’:

  1. (17)
    1. a.

      If John smokes, he is nervous.

    2. b.

      If smoke, then fire.

The clear intuition for the above conditional is that it is appropriate because esignals that i is the case. There exists an evidential, or diagnostic, but no causal, dependence relation from e to i. Given that causation is asymmetric, and that we analyze conditionals in terms of causal powers, one wonders how we could analyze such conditionals. Before we will delve into that question, let us first remind ourselves that there is another empirical fact to be explained. Evans et al. (2007) and Douven and Verbrugge (2010) have shown experimentally that the acceptability of such, so-called, diagnostic conditionals is, although not identical to, still very close to the conditional probability of e given i.Footnote 17 Thus, we would like to investigate the following two questions:

  1. 1.

    Can we explain the appropriateness of diagnostic conditionals in terms of causal powers? And if so,

  2. 2.

    can we explain the relatively strong correlation that exists between the assertability/acceptability of such conditionals on the one hand, and the conditional probability of consequent given antecedent on the other?

In this section we will provide an explanation for both on the assumption that diagnostic conditionals should have somewhat different acceptability requirements than causal conditionals.

We will assume that the probability of evidential, or diagnostic, conditionals should be measured by what Cheng et al. (2007) determine as the probability that i caused e, or perhaps by what they determine as the probability that \(i \ alone\) caused e. To see what this comes down to, let us first determine the probability that given e, e is due to i, \(P(i \leadsto e|e)\). Given that e is caused by i with probability \(P(i) \times p_{ie}\), this can be given as follows:Footnote 18\(^,\)Footnote 19

$$\begin{aligned} P(i \leadsto e|e) = \frac{P(i) \times p_{ie}}{P(e)}. \end{aligned}$$
(18)

Recall that we have claimed in Sect. 2 that the conditional ‘If i, then e’ is appropriate only if \(\Delta ^* P^e_i\) is high, which means that \(\Delta ^* P^e_i>\!> \Delta ^* P^e_a\). Assuming a reading, or appropriateness condition, of diagnostic conditionals in terms of \(P(i \leadsto e|e)\), this means that ‘If e, then i’ is appropriate only if \(P(i \leadsto e|e)>\!> P(a \leadsto e|e)\). Now suppose that we assume that \(P(i) \approx P(a)\). Then it follows that \(P(i \leadsto e|e)>\!> P(a \leadsto e|e)\) iff \(p_{ie}>\!> p_{ae}\). Thus, from the fact that ‘If e then i’ is a good conditional, together with the commonly assumed causal structure \(i \rightarrow e \leftarrow a\), it follows that i is taken to be the best causal explanation of e, at least if \(P(i) \approx P(a)\).

Because \(p_{ie}>\!> p_{ae}\) it follows that \(P(e|\lnot i) = P(a) \times p_{ae}\) will be low, and thus that \(P(e|i) - P(e|\lnot i)\) will be close to P(e|i). Because for the same reason \(1 - P(e|\lnot i)\) will be close to 1, \(p_{ie} = \frac{P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)}\) will be close to P(e|i). In this way we have explained the experimental results of Evans et al. (2007) and Douven and Verbrugge (2010) that the acceptability of diagnostic conditionals like (17-a)-(17-b) correlates well with the conditional probability of consequent given antecedent.

Similar conclusions follow when we account for acceptability of diagnostic conditionals in terms of the probability that given e, e is due to ialone, \(P(i \ alone \ \leadsto e|e)\), which abbreviates \(P((i \wedge \lnot (i \wedge a)) \ \leadsto e|e)\). To derive the latter notion, recall from a previous section that if i and a are independent of each other,

$$\begin{aligned} P(e) = P(i) \times p_{ie} + P(a) \times p_{ae} - P(i) \times p_{ie} \times P(a) \times p_{ae}. \end{aligned}$$
(19)

Because we have seen above that \(P(e|\lnot i)\) estimates \(P(a) \times p_{ae}\), this reduces to

$$\begin{aligned} P(e) = P(i) \times p_{ie} + P(e|\lnot i) - P(i) \times p_{ie} \times P(e|\lnot i). \end{aligned}$$
(20)

Given that e is caused by i with probability \(P(i) \times p_{ie}\) and that e is caused by \(i \wedge a\) with probability \(P(i) \times p_{ie} \times P(e|\lnot i)\), it follows that

$$\begin{aligned} P(i \ alone \ \leadsto e|e)= & {} \frac{P(i) \times p_{ie} - P(i) \times p_{ie} \times P(e|\lnot i)}{P(e)}\nonumber \\= & {} \frac{P(i) \times p_{ie} \times [1 - P(e|\lnot i)]}{P(e)}. \end{aligned}$$
(21)

Now, recall that under independence conditions we have derived the following for \(p_{ie}\)

$$\begin{aligned} p_{ie} = \frac{\Delta P^e_i}{1 - P(e|\lnot i)} = \frac{P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)}. \end{aligned}$$
(22)

Substituting this measure in the above formula gives us

$$\begin{aligned} P(i \ alone \ \leadsto e|e) = \frac{P(i) \times \Delta P^e_i}{P(e)}. \end{aligned}$$
(23)

We want to conclude from sentences (17-a)-(17-b) of the form ‘If e then i’ that \(P(i \ alone\ \leadsto e|e)>\!> P(a \ alone\ \leadsto e|e)\). This holds iff \(\frac{P(i) \times \Delta P^e_i}{P(e)}>\!> \frac{P(a) \times \Delta P^e_a}{P(e)}\). If we now assume that \(P(i) \approx P(a)\), it follows that \(P(i \ alone \ \leadsto e|e)>\!> P(a \ alone\ \leadsto e|e)\) iff \(\Delta P^e_i>\!> \Delta P^e_a\). Thus, from the fact that ‘If e then i’ is a good conditional, together with the commonly assumed causal structure \(i \rightarrow e \leftarrow a\), it follows again that i is taken to be the best causal explanation of e, at least if \(P(i) \approx P(a)\).

Now, can we derive from this also that the acceptability of the conditional goes with conditional probability? Yes, this is the case, because

$$\begin{aligned} P(i \ alone \ \leadsto e|e)= & {} \frac{P(i) \times \Delta P^e_i}{P(e)}\nonumber \\= & {} \frac{[P(e|i) \times P(i)] - P(e|\lnot i) \times P(i)}{P(e)}\nonumber \\= & {} \frac{P(e \wedge i) - P(e|\lnot i) \times P(i)}{P(e)}. \end{aligned}$$
(24)

Because \(P(i \ alone\ \leadsto e|e)>\!> P(a \ alone\ \leadsto e|e)\) it follows that \(P(e|\lnot i) = P(a) \times p_{ae}\) will be low, and thus that \(P(i \ alone \ \leadsto e|e)\) will be close to \(\frac{P(e \wedge i)}{P(e)} = P(i|e)\). In this way we have explained the experimental results of Evans et al. (2007) and Douven and Verbrugge (2010) that the acceptability of diagnostic conditionals like (17-a)-(17-b) correlates well with the conditional probability of consequent given antecedent.

As usual, if we assume that i is the only potential cause of e, or if we assume incompatibility of i and a instead of independence, the derivations of our two desired conclusions are much easier.

If i is (taken to be) the only potential cause of e, the inference is trivial. First, assume that the diagnostic conditional is interpreted in terms of \(P(i \leadsto e|e) = \frac{P(i) \times p_{ie}}{P(e)}\). We have seen before that in case i is the only potential cause of e, \(p_{ie} = P(e|i)\). But this means that \(P(i \leadsto e|e) = \frac{P(i) \times P(e|i)}{P(e)} = \frac{P(i \wedge e)}{P(e)} = P(i|e)\). This indicates that (17-a) is good if P(i|e) is high, or significantly higher than P(e|a), which was what we had to explain.

So far so good, but what if we assume that the diagnostic conditional is interpreted in terms of \(P(i \ alone \ \leadsto e|e)\)? Recall that in general

$$\begin{aligned} P(i \ alone \ \leadsto e|e) = \frac{P(i) \times p_{ie} \times [1 - P(e|\lnot i)]}{P(e)}. \end{aligned}$$
(25)

If i is the only potential cause of e, \(P(e|\lnot i) = 0\), and thus \(P(i \ alone \ \leadsto e|e) = \frac{P(i) \times p_{ie}}{P(e)} = \frac{P(i \wedge e)}{P(e)} = P(i|e)\), because if i is the only cause of e, \(p_{ie} = P(e|i)\).

For the case that i and a are incompatible, the derivation is equally straightforward. First, notice that in that case \(P(i \ alone \ \leadsto e|e) = P(i \leadsto e|e) = \frac{P(i) \times p_{ie}}{P(e)}\). We have seen in the previous section that in case i and a are incompatible, \(p_{ie} = P(e|i)\). It follows that \(P(i \ alone \ \leadsto e|e) = P(i \leadsto e|e) = \frac{P(i) \times P(e|i)}{P(e)} = P(i|e)\). This indicates, again, that (17-a)-(17-b) are good if P(i|e) is high, or significantly higher than P(e|a).

In this section we have assumed, so far, that conditionals of the form ‘If i, then e’ are ambiguous: they are appropriate either due to high \(p_{ie} = \Delta ^* P^e_i\), or due to high \(P(e \leadsto i|i)\). But are (the appropriateness conditions of) conditionals really ambiguous in this way? Can’t we interpret them uniformly in terms of high \(\Delta ^* P^e_i\)? Given the form of the conditional, this means that we want to explain that \(P(e \leadsto i|i)\) is high on the assumption that \(\Delta ^* P^e_i\) is high. It turns out that we can.

Because the conditional is appropriate, it follows that \(\Delta ^* P^e_i = \frac{ P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)}\) is high. Now recall from Sect. 2 that \(\Delta ^* P^e_i = \frac{ P(e|i) - P(e|\lnot i)}{1 - P(e|\lnot i)} = \frac{ P(e|i) - P(e)}{P(\lnot e \wedge \lnot i)}\). Similarly, \(\Delta ^* P^i_e = \frac{ P(i|e) - P(i)}{P(\lnot e \wedge \lnot i)}\). One can prove that \(P(e|i) - P(e) = \frac{P(e)}{P(i)} \times [P(i|e) - P(i)]\),Footnote 20 and thus that \(\Delta ^*P^e_i = \frac{P(e)}{P(i)} \times \Delta ^* P^i_e\). But recall that under suitable independence conditions \(\Delta ^* P^i_e = p_{ei}\). It follows that \(\Delta ^*P^e_i = \frac{P(e)}{P(i)} \times p_{ei} = \frac{P(e) \times p_{ei}}{P(i)} = P(e \leadsto i|i)\). Thus, under suitable independence conditions, as far as the numbers are concerned, \(\Delta ^*P^e_i = p_{ie} = P(e \leadsto i|i)\)! As a consequence, a conditional of the form ‘If i, then e’ is appropriate only if \(\Delta ^* P^e_i\) is high, which means that depending on the (assumed) causal structure, either \(p_{ie}\) is high, or \(P(e \leadsto i|i)\) is high! We can conclude that we don’t need two different types of conditions for a conditional of the form ‘If i, then e’ to be appropriate. One condition will do, and depending on the assumed causal structure, it will give rise to the desired causal reading. We have seen that only under some strong conditions high \(\Delta ^* P^e_i\) gives rise to high P(e|i) under the causal forward reading of the conditional, but also that it more naturally gives rise to high P(e|i) under an diagnostic, or evidential, reading. We take this to be in accordance with the experimental observations of Douven and Verbrugge (2010).

Of course, for a conditional of the form ‘If i then e’ to be appropriate it need not be the case that (i) i is a cause of e, or that (ii) e is the cause of i. It might simply be the case that there exists a semantic or deductive relation between i and e: if someone is a bachelor, he is an unmarried man, and if y + 2 = 6, then y = 4. Another reason why a conditional can be true might be due to what is sometimes called ‘metaphysical grounding’. The experimental data of Douven and Verbrugge (2010) show that for such cases there exists a strong correlation between acceptance of the conditional and its conditional probability. It is also quite clear why: both acceptance and conditional probabilities will be 1. Let us concentrate therefore on a more challenging type of ‘empirical’ conditionals whose acceptability is due neither to (i) or (ii): the case where there is a common cause of i and e. Suppose that we have a causal structure of the form \(i \leftarrow c \rightarrow e\), and thus a structure where i neither causes nor is caused by e. To make it concrete, let i stand for the falling barometer, e for storm, and c for low pressure. With this instantiation of the variables, the conditional ‘If i, then e’ is appropriate. We don’t know whether there exists a strong correlation between acceptance of the conditional and the corresponding conditional probability for such cases, but we can makes clear under which circumstances this could be the case.

It seems natural that the probability of the conditional ‘If i, then e’ can now be measured by \(P(c \leadsto i|i) \times p_{ce}\). This is \(\frac{P(c) \times p_{ci}}{P(i)} \times p_{ce} = \frac{P(c) \times p_{ci} \times p_{ce}}{P(i)}\). We have seen above that under natural conditions this is nothing but \(\frac{P(c \wedge i \wedge e)}{P(i)} = P(c \wedge e| i)\). If we now assume that (almost) only c can cause i, and thus that \(P(c|i) \approx 1\), \(P(c \leadsto i|i) \times p_{ce}\) comes down to P(e|i), the conditional probability.

6 Conclusion

Our goal was modest: explain why the acceptability of many conditionals ‘goes by’ conditional probability but at the same time account for a dependence relation between antecedent and consequent. We have shown that this is possible once we assume that the appropriateness of conditionals depends on causal facts. Although in many—perhaps normal—cases acceptability cannot go with conditional probability on our assumption of causality, we have identified two cases in which it can go: (i) when the antecedent can be thought of to cause (or have caused) the consequent, but where the antecedent is in some strong dependence relations with the alternative causes, or (ii) when the conditional should be read diagnostically, on the assumption that the alternative causes i (the consequent) and a of the to be explained antecedent are either incompatible with each other, or equally probable. A pleasing consequence of our analysis was that we didn’t need two separate conditions for a conditional of the form ‘If i, then e’ to be appropriate. One condition will do: \(\Delta ^* P^e_i\) should be high.