1 The independence strategy

Goodman’s new riddle of induction is new no longer.Footnote 1 Many answers to the riddle have been proposed.Footnote 2 The most popular answers break our predicates—or the properties they express—into those that are inductively projectible and those that are not. The basis for this has varied. Perhaps it is because some predicates are well-entrenched but others are not.Footnote 3 Perhaps it is because some predicates express qualitative properties but others do not.Footnote 4 Perhaps it is because some predicates are ostensive but others are not.Footnote 5 Or perhaps it is because some predicates pick out natural kinds but others do not.Footnote 6 Whatever their differences, these approaches all agree that scientific methodology requires a slight philosophical tweak. Philosophers must write a prolegomena to any future statistics textbook, so to speak.

To others it has seemed better to let statistics take care of itself. All predicates are projectible in the right circumstances. When we project from sample to population, what matters is independence. Application of the predicates involved shouldn’t depend on our sampling method in any fashion. This is not a philosophical addition to statistical methodology; statisticians already recognize the necessity of sampling independence. They do everything they can to guard against the observer effect in all of its forms. They aim for a fair sample of the population under study. And they keep their eyes open to look for confounding variables. The independence strategy claims that the grue puzzle teaches us these old lessons in a new form.

The strategy was first pursued in the 1970s, when different versions of it were defended by T.E. Wilkerson, John Moreland, and Frank Jackson.Footnote 7 It then disappeared, but has more recently made a comeback. Variants of Jackson’s approach have been endorsed by Peter Godfrey-Smith and Samir Okasha.Footnote 8 And related approaches have been developed by Alfred Schramm and Wolfgang Freitag.Footnote 9 I think that independence is the solution to the grue puzzle, but that no extant independence theory gets things completely right. Here I attempt to do better.

2 The explanatory target

After observing many green emeralds, we conclude that all emeralds are green. Yet the emeralds we’ve seen have also been grue—either green and previously observed or blue and previously unobserved—and we don’t conclude that all emeralds are grue.Footnote 10 Obviously, these policies are correct. The philosophical challenge is to explain why.

On this understanding, the challenge of grue is explanatory, epistemological, and methodological all at once. Despite some of Goodman’s rhetoric, this coheres well with how the new riddle is generally understood. Given our evidence, the inductive argument to “all emeralds are green” is good, and the inductive argument to “all emeralds are grue” is bad. Why? This is the question to be answered, whatever your favored terms for epistemic goodness and badness.

I’m going to use “warrant” as a covering term, but whatever epistemic terminology we fix on, we will need to distinguish between external and internal types of warrant.Footnote 11 If I absurdly but sincerely believe that I can fly, there is a sense in which I am warranted in jumping off of the Empire State Building’s observation deck to make a lunch meeting downtown. There is also a sense in which I am not warranted, since my belief that I can fly is itself unwarranted. Some types of “warrant” are external and worldly; others are more internal and psychological. Most likely, both of these types of warrant themselves come in many different flavors. This central distinction, along with other related distinctions, is commonly made in epistemological discussions of externalism and internalism. There is a crucial difference between these types of warrant. If our inductive practices are externally warranted, then gruesome practices are not. Yet green lovers and grue lovers alike can both be internally warranted in their inductive practices at the same time and in the same world, at least in principle.

These two notions correspond to at least two distinct grue challenges. The internal challenge of grue is to provide an account of why we are internally warranted in projecting “green” but not “grue”. The external challenge of grue is to provide an account of why we are externally warranted in projecting “green” but not “grue”. Although the challenges are distinct, they are rarely distinguished in discussions of grue.Footnote 12 A failure to distinguish between them has only added to “the” new riddle’s difficulty.

I don’t think we will have a satisfying solution until we are able to answer both challenges. My goal is to formulate an independence theory that does this. I start by critically examining every extant independence theory. I do this not just to take a pleasant stroll down memory lane, but to learn enough to succeed where others have fallen short.

3 The search for independence: beginning

Independence-based approaches were introduced in 1973 in a 2-page note by T.E. Wilkerson. To start, Wilkerson noted that setting up the grue riddle requires assuming that all emeralds are green. He also noted that the reference to time in the definition of “grue” only served the purpose of restricting the grue emeralds to those in our sample. Accordingly, “grue” can be redefined as being either green and in our sample, or blue and not in our sample. This makes the unprojectibility of “grue” unsurprising, since it is a logical truth that the emeralds that aren’t in our sample aren’t in our sample.

The crucial idea is that for a given green emerald \({\textsf{e}}\), the truth of “\({\textsf{Grue}}({\textsf{e}})\)” is “determined entirely” by the truth of “\({\textsf{Sample}}({\textsf{e}})\)”. And even without the assumption that all emeralds are green, the truth of “\({\textsf{Grue}}({\textsf{e}})\)partially depends on the truth of a sampling claim. Insofar as it does, it is not projectible. Wilkerson suggests that this diagnosis will work for all gruesome predicates.

Wilkerson’s forgotten note contains, in brief, the crucial idea that the key to solving the new riddle is recognizing that the grue-facts are dependent in a way that the green-facts are not. But he doesn’t tell us anything about how to understand “dependence”. Without more development, many gruesome predicates will slip through. So too will other problematic inductions. All of the lobsters I have observed have been red, but this is because all the lobsters I have observed have been cooked. Clearly I am not justified in concluding that all lobsters are red.Footnote 13 The problem here isn’t merely sample-dependence.

Shortly after Wilkerson’s paper appeared, a March 1974 APA presentation by John Moreland gave an account of grue along broadly similar lines. As far as I can tell, Moreland’s account was not influenced by Wilkerson’s, but I see them as related. Moreland’s published account, from 1976, though excellent, is—like Wilkerson’s—completely unknown currently.Footnote 14 This is probably because Moreland situated his approach within now mostly-forgotten debates over Carnapian confirmation theory.Footnote 15 But his philosophical points don’t depend upon this.

Within the Carnapian framework, Moreland argues that gruesomeness is avoided if our sample is random. He notes that talk of samples being “random” is really a comment on how the sample was drawn—an n-membered sample S drawn from a population P by method M is random just in case any other n-tuple of members of P was just as likely to be drawn by M as S was. This is basically one of the standard textbook definitions of statistical randomness. Unfortunately it is absurdly demanding. Almost no sampling method is “random” in this sense.

In response Moreland redefines “randomness”. In my terminology, his definition says that a sample drawn from population F is random with respect to property G just in case our credence that \(\alpha \) is in the sample conditional on it being a member of the population is identical to our credence that \(\alpha \) is in the sample conditional on it being a member of the population that is also G:

$$\begin{aligned} {\textrm{Cr}}\left( \dfrac{{\textsf{Sample}}(\alpha )}{F\alpha }\right) ={\textrm{Cr}}\left( \dfrac{{\textsf{Sample}}(\alpha )}{F\alpha \wedge G\alpha }\right) \end{aligned}$$

He actually talks of “choosing” a confirmation function using your beliefs about the sampling method, but I have put things in more modern, subjective terms. In this rendering, our credences are informed, for any sample predicate, by our beliefs about the sampling method.

Moreland claims that we can only project the rate of Gness in our samples of Fs when our sampling method is random with respect to F and G. As he stresses, this allows that in some epistemic contexts it would be rational for us to project “grue” and not “green”. All it would take is alternative beliefs about sample randomness. One of the nicest things about Moreland’s account is that it ties the grue puzzle directly to the issue of sampling bias. Many apparent problems with Moreland’s account are merely terminological—for example, redefining “random” instead of introducing a term without any established statistical meaning, and talking of “confirmation functions” rather than probability or credence.

On the currently popular, subjective, forms of Bayesianism, this kind of approach might help to solve the internal challenge of grue, but it is powerless against the external challenge. Versions of objective Bayesianism, of which Carnap’s approach to logical probability and inductive logic is one important variant, might be able to resolve both the internal and external challenges along these lines. However, when understood in this way, Moreland’s randomness condition might be too demanding. Also, though I’m sympathetic to both logical probability and objective Bayesianism myself, I don’t want to overtly assume these views here.Footnote 16 In any case, the historical development of independence-based approaches took a somewhat different path.

The year after Moreland’s APA presentation, an influential article on grue by Frank Jackson appeared. It contains the most well-known independence approach. Jackson understands independence counterfactually. He frames his discussion around an inference rule connecting the fact that all Fs that are H have been G, to the claim that all Fs—whether H or not—are G. Jackson calls this the straight rule (SR). It can be formalized in many equivalent ways, including:

$$\begin{aligned} (SR)\qquad \dfrac{\forall x(Hx\wedge Fx\rightarrow Gx)}{\forall x(Fx\rightarrow Gx)} \end{aligned}$$

Obviously (SR) is not deductively valid. Even its inductive goodness varies depending on how many Fs have been sampled.

Jackson argues that we do not need to restrict (SR) to previously distinguished projectible predicates. What we need instead is to recognize a counterfactual condition (CC) on inductive support by way of the straight rule:

figure a

In the case of “grue”, the predicate “\({\textsf{Observed}}\)” itself is such an H. For an observed green emerald \({\textsf{e}}\), we know that:

So all observed emeralds being grue does not support the hypothesis that all emeralds are grue. But also:

So if \(\lozenge \lnot {\textsf{Observed}}({\textsf{e}})\), then it can’t also be the case that:Footnote 17

Thus we can’t know of the observed emeralds that if they hadn’t been observed they wouldn’t be green. So the hypothesis that all emeralds are green is supported by all observed emeralds being green, by the (SR). Similar reasoning applies when H is a different predicate, like “has been cooked” in the lobster example. In my terminology, Jackson claims that knowledge of dependence defeats inductive projection. Those in gruesome practices can’t have knowledge of false things, so if this works it answers the external challenge of grue.Footnote 18

Unfortunately, (CC) doesn’t screen off all problematic predicates. When he introduced grue, Goodman also introduced emeroses—things that are either observed emeralds or unobserved roses. If you have observed many green emeralds, then you have also observed many green emeroses. Clearly you aren’t justified in concluding, on this basis, that all emeroses are green. Yet any observed green emerose would have been green even if it had not been observed, so Jackson’s (CC) doesn’t undermine this problematic inference. Here it is the make-up of the population that counterfactually varies across different observations, not the extension of the predicate being projected.

In a 1980 paper co-written with Robert Pargetter, Jackson revised his approach to handle this case. The revised theory replaces the counterfactual condition with nomological condition (NC):Footnote 19

figure b

This modifies Jackson’s original proposal in a number of important respects. In (CC) it is knowledge of the dependence of a single property found in the sample that defeats projection; in (NC) it is instead reasonable belief about the independence of both a property found in the sample and the property distinguishing the population that enables projection.

This is an improvement, since (NC) but not (CC) handles the emerose case, but it doesn’t handle every gruesome predicate. Peter Godfrey-Smith pointed out that a simple modification of the “emerose” definition slips past (NC)—something is an emerose\(_{2}\) just in case it is either an emerald or an unobserved rose.Footnote 20 A given observed green emerald would still, by this definition, have been both green and an emerose even if it hadn’t been observed.

There are also semantic tricks that might sidestep some of these counterfactual conditions. Let something be “rigidly grue” just in case it is an actually observed green thing or an actually unobserved blue thing. The “actually” operator anchors the extension of this predicate to the green things observed in the actual world, along with the blue things unobserved in the actual world. So the extension of “rigidly grue” doesn’t vary counterfactually. This was directly built-in by hand, but something similar can happen in more realistic cases. I carefully observe 500 snowflakes, finding them all to have unique shapes. I then say that something is G just in case it has observed shape 1 or observed shape 2 or ... or observed shape 500. Would the observed snowflakes have been snowflakes and G even if they hadn’t been observed? Obviously yes. Yet it would be silly to then conclude that all snowflakes are G.

To get around this we need to proceed more subtly. If we consider the counterfactual situation as actual, many rigidification tricks like this are circumvented. When we do this we imagine reference being fixed for our terms in that world. So understood, the extension of “rigidly grue” does vary in counterfactual situations. And so too does the shape predicate “G”, since its reference was fixed using the shapes of sampled snowflakes. Nothing like this point has been noted in the vast literature on grue, but related points are familiar from discussions of two-dimensional modal logic, so I won’t belabor them here.Footnote 21

None of Wilkerson, Moreland, or Jackson cite each other. All three seem to have hit upon their approaches independently (ahem). They have never before been grouped together as originating a strategy for approaching grue. But all of them are naturally seen as advocating a version of the independence strategy. A key obstacle is in precisely characterizing “independence”. Wilkerson left it completely unexplained, Moreland understood it in terms of conditional probability, and Jackson analyzed it using our attitudes toward counterfactuals. After these pioneering approaches, the independence strategy went dormant for a quarter century. More recently it has made a comeback, in both old and new forms.

4 The search for independence: middle

The independence strategy’s comeback has been driven by reappraisal of Jackson’s approach. Samir Okasha endorsed a view that modifies (CC) to require only belief:

figure c

This makes the rationality of an inductive inference relative to background beliefs. Whether those beliefs are themselves rational is a different story. Okasha’s approach has an easy time with the internal challenge of grue, but is helpless against the external challenge. Additionally, his approach can’t handle the emerose cases.

Alfred Schramm has recently developed a theory similar to Jackson’s. He distinguishes inductive support from confirmation, and claims that observations of Fs only support the hypothesis that all Fs are G when (i) our respective belief that each observed F is G counterfactually depends on our observation of that particular F:

And (ii) we accept the counterfactual:

Condition (i) builds in a dependence of our beliefs upon our observations. This is important in some contexts. Some of the premises of an inductive inference are based on our observations, others are based on different beliefs of ours. A satisfying theory of induction should allow for this. I agree, but worry that counterfactual relations are too coarse-grained to capture the idea.

Condition (ii) is related to Jackson and Pargetter’s (NC). It is weaker in that, while they required rational belief in these counterfactuals, Schramm only requires belief. Schramm’s approach is also less general in two ways. First, in using only “\({\textsf{Observed}}\)” rather than arbitrary properties (this might make it difficult to handle the red lobster case). Second, in not including the modifications needed to deal with the emerose case. And like Okasha’s approach, at best Schramm’s approach only answers the internal challenge of grue since it goes in terms of our beliefs.

Peter Godfrey-Smith, building on Jackson and Pargetter and unpublished work by Alexis Burgess, proposed the following as the “proper form” of an inductive inference:Footnote 22

figure d

The exclusion of strengthening and weakenings of G is to prevent proper inductive arguments from being, thereby, deductively valid.

The connection to Jackson comes with clause (2.3). The “because” claim is supposed to be strong enough to entail counterfactuals saying that if this F that is O had not been C, it also wouldn’t have been G. Jackson took knowledge of such counterfactuals to undermine induction, but Godfrey-Smith takes their mere truth to undermine induction. When and only when all of his premises are true, we have inductive support for the conclusion.

In grue-like cases, there is a confounding property negating premise (2). For “grue”, being observed is the confounding property; for both definitions of “emerose”, being an emerald is the confounding property. The (JPB) account goes beyond both versions of Jackson’s in being able to block the induction to all emeroses\(_{2}\) are green. However, I am not sure that it properly handles emerose\(_{2}\) inductions to different conclusions. Also, Robert Schwartz argued that there are gruesome predicates that this approach doesn’t touch.Footnote 23 The cases in question were first given by Israel Scheffler.Footnote 24 Suppose \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) are all of the emeralds in our sample. Say that something is grue\(_{2}\) just in case it is either one of \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) and green or not one of \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) and blue. Schwartz notes that there isn’t a confounding property here, so we can conclude that all emeralds are grue\(_{2}\), which is absurd. Whether this is convincing or not might depend on what counts as a property.

So Godfrey-Smith’s approach might not handle all gruesome predicates. Still, it does rule out many of the cases we want ruled out. Unfortunately it also rules out many cases we want ruled in. We encounter many crows, all of them black. The encountered black crows are also melanin-rich, and they are black because they are melanin-rich. Yet some crows are not melanin-rich—albino crows. Not all black things are melanin-rich, and not all melanin-rich things are black (consider dumping a bunch of melanin into a large canister of green paint), so this property is not ruled out as either a strengthening or weakening of G. So even after we have seen many crows, all black, our inductive inference to “all crows are black” is unwarranted.

This seems wrong. A central hallmark of inductive inferences is that we can do everything right and yet still end up with a false belief. The problem here is that whenever there is an explanation of why the Fs that are O are G, without all Fs also satisfying this explanation or being G, Godfrey-Smith’s analysis deems the inductive inference unjustified. And there will almost always be an explanation for Gness in our sample, so the (JPB) proposal usually reduces the rationality of an inductive inference to the truth of its conclusion. Simply put, this approach rules out many perfectly good inductive inferences. This would cripple our inductive practices beyond repair. We need an alternative way to handle emerose\(_{2}\)-cases.

Recently Wolfgang Freitag has developed a different kind of independence approach. The key novelty is a focus on doxastic dependence relations—counterfactual relations that hold between different beliefs of ours. He builds this on a foundation that resembles Wilkerson’s account. This population of Fs is exhausted by the union of the sampled Fs and unsampled Fs. Freitag says that a predicate P is epistemically discriminating for us just in case we both (i) know that all sampled Fs are P, and (ii) know that all unsampled Fs are not P. Epistemically discriminating predicates cannot be projected, as (ii) directly defeats their projection. So a predicate like “\({\textsf{Sample}}\)” is epistemically discriminating and cannot be projected. Epistemic discrimination is not an intrinsic feature of predicates. It is instead relative to what we know in a given context.

In addition to direct defeat, we also need a notion of indirect defeat. We can’t simply say that any predicate whose application is entailed by application of a directly defeated predicate is thereby indirectly defeated. Suppose that David has observed many emeralds being green, so he infers that each of those emeralds is also green or in our sample. On the basis of these premises, he concludes that “all emeralds are either green or in our sample”. This reasoning is fine. Now consider Nelson, who observes that all the emeralds in our sample are in our sample, so infers that each of them is also green or in our sample. On the basis of these premises, he concludes that “all emeralds are green or in our sample”. This reasoning is not fine. The very same predicate can be indirectly defeated in one context but not in another.

To accommodate this, Freitag offers the following construal of derivative defeat:

figure e

Freitag characterizes this counterfactually (while admitting it might be primitive). Say that G epistemically depends upon H just in case if we didn’t believe that something was H, we wouldn’t believe that it was G:

The grue hypothesis is derivatively defeated because the inductive evidence for grue epistemically depends upon the inductive evidence for projection of “\({\textsf{Sample}}\)”, which is an epistemically discriminating predicate and is thus directly defeated.

One hiccup is that, as defined, the notion of epistemic discrimination is too demanding. The knowledge that at least one unsampled F is not P is sufficient for direct defeat. As we just saw, Godfrey-Smith adopts this weaker requirement—though only in terms of requiring that the weaker condition is true, not that we know it is true. This weakening is a step in the right direction if we want to answer the external challenge of grue.

Despite its many virtues, there are some problems lurking for Freitag’s theory. In the counterfactual sense, we can always find a predicate upon which any G epistemically depends. If I didn’t believe that \({\textsf{e}}\) was a material object, then I wouldn’t believe it was green in the relevant sense. This isn’t automatically a problem, by itself. Yet related cases might pose a problem. If I didn’t believe that \({\textsf{e}}\) had “looked green when I observed it”, then I wouldn’t believe that it was green. And this predicate is epistemically discriminating over the population of emeralds, so we seem to have defeated the projection to “all emeralds are green”. This is related to the simpler worry that all predicates epistemically depend upon the directly defeated “\({\textsf{Sample}}\)”, so all induction is undermined. Freitag responds to this concern by saying the dependence here is genetic and not epistemic.Footnote 25 He spells this out in the manner of Schramm’s condition (i). As was the case there, I think it would probably be better to proceed non-counterfactually.

Even if Freitag’s proposal worked perfectly, it would need supplementation to answer the external challenge. It also doesn’t apply cleanly to certain types of alternative languages, where “grue” is a primitive term. And there are types of sample bias that it leaves entirely to one side. Induction can be externally undermined by having a biased sample, even if we don’t know it. We need to account for these kinds of cases to fully and completely solve the external challenge of grue.

It is time to reflect on the lessons learned during this visit from the ghost of independence past, and to make a new attempt. There are dangers on each side. Some accounts rule out too much, others too little. Like Goldilocks, we need an account that is instead just right. In what follows I try to improve on past accounts, combine them into a single theory, and feed the total package into a more varied and subtle epistemological picture.

5 The search for independence: end?

In everyday inductive reasoning we move directly from all Fs in our sample being G, to all Fs whatsoever, being G:

figure f

My focus is on pure cases of induction. Cases where abductive reasoning concerning Fs and Gs plays no direct role in reaching the conclusion.Footnote 26 So understood, mathematical statistics tells us that inductive goodness requires that our sample is large:

(2) Our sample of Fs is large enough.

The law of large numbers allows us to understand “large enough” in an absolute sense, rather than as a proportion of the overall population of Fs. A “large enough” sample is one for which the probability that the G-rate in the sample “matches” the G-rate in the population is sufficiently high. But both what counts as “matching” and what counts as “sufficiently high” can vary from context to context. This should be understood throughout. By anyone’s lights, that our sample is large enough, in this sense, is crucial to accounting for the reliability of induction. Donald Williams ingeniously argued that it was also nearly sufficient.Footnote 27 Many have disagreed, but even leaving those disagreement to one side, the grue puzzle shows that we need something more.Footnote 28

While many philosophers have looked for this something more outside of statistical methodology, the independence strategy looks inside. Any good inductive inference requires that the evidence gathered about Gness over F, is robustly independent of the ways it was gathered. Each independence theory has offered an account of the needed independence. As we’ve seen, all of the accounts have issues. The lesson I draw from them is that there are at least three distinct, relevant types of independence. Two of which I discuss here, and jointly use to characterize robust independence. The third concerns the broader epistemic context of inductive inference and is dealt with in the following section.

\(({\textrm{I}})\) Methodology. Let’s distinguish between sampling methods and observation methods. A sampling method is a way of drawing some Fs. And an observation method is a way of testing the Fs so-drawn for Gness. Experimenters recognize that F and G need to be independent of both our sampling and observation methods. Typically, actual scientists are concerned only with causal independence. Suppose we are trying to project the lung cancer rate in the US population from a large sample of US citizens. If we screen for lung cancer using a magical radioactive X-ray machine that invariably and immediately causes lung cancer, we are moral monsters. If we additionally draw the conclusion that everyone has lung cancer, we are also poor reasoners. Causal dependence isn’t the only way things can go awry. The predicate “grue” is dependent upon any observation method, almost by definition. We need to formulate our independence condition so that it screens out not only “grue”, but also all other confounding factors.

Above we learned that in attempting this we need to consider the relationship between the make-up of the population and our observations. Otherwise, like Jackson’s original account, we will handle “grue” without handling “emerose”. Jackson and Pargetter already modified the counterfactual approach to handle this case, but I depart from them in four ways. First, unlike them, I am aiming to characterize a factual notion of independence, so I do not yet build in anything about our beliefs. Second, I note that we should ultimately formulate things so as to avoid rigidification tricks, like those used to define “rigidly grue”. Third, I proceed more generally, by explicitly building both the method of sampling, \(M_{S}\), and the method of observing, \(M_{O}\), into my account. Fourth, I include some general facts, not just reasonable beliefs about sampled objects.

Suppose we have sampled some Fs using method \(M_{S}\), and found them all G using method \(M_{O}\). We need the facts about both the Fness and Gness of the objects in our sample to be independent of our methods of sampling and observation. This requires that Fness and Gness are not changed by either \(M_{S}\) or \(M_{O}\):

Methods \(M_{S}\) and \(M_{O}\) are methodologically independent of F and G if and only if neither the F-population nor the distribution of G over the F-population would be altered by applications or non-applications of \(M_{S}\) or \(M_{O}\), in relevant situations considered as actual

This requires that the Fs in our sample, observed to be G, would still have been both F and G even if they had been unsampled by \(M_{S}\) and unobserved by \(M_{O}\). Of the unsampled and unobserved Fs, it does not require any particular distribution of G over them, only that the distribution is not altered by application of \(M_{O}\), and that the make-up of Fs is not altered by \(M_{S}\). As such, it covers all of the non-doxastic counterfactuals required by previous independence approaches. Though it is actually slightly stronger in requiring that the F and G-facts are left totally unchanged by our sampling and observation methods.Footnote 29

We probably don’t need to make a confounding property—like Jackson’s “H” or Godfrey-Smith’s “C”—completely explicit here, though we could easily do so. Any such confounding factors are involved in application of our methods (like when my method of sampling and observing lobsters involves ordering lobster tail for dinner) and others will be handled below. This definition handles both “grue” and “emerose” just like Jackson and Pargetter’s did.Footnote 30 And the final clause ensures that rigidification tricks don’t trip things up. Consider a reading of Scheffler’s grue\(_{2}\) case where the property of being one of \(\textsf{e}_{{{\textrm{1}}}}\), \(\textsf{e}_{{{\textrm{2}}}}\), ..., \(\textsf{e}_{{{\textrm{999}}}}\) is specified by satisfying “\({\textsf{Sample}}(x)\)”. In that case, the property’s extension will vary in other worlds, considered as actual. It arguably doesn’t handle a deeply rigid version of Scheffler’s case—see the next section for that.

It seems to handle the modified emerose case, since different observations being made would change the extension of “emerose”. Even if not though, the emerose case is handled by the next condition. Some other weird cases might also slip through. Say that a “nemerald” is an emerald located in North America. I gather and observe thousands of emeralds throughout North America, and then conclude that all emeralds are nemeralds. The emeralds in my sample would still have been both emeralds and nemeralds even if they hadn’t been sampled or observed. The emeralds not in the sample that are in North America would have remained nemeralds, even if sampled. The emeralds not in North America are not nemeralds, and wouldn’t have been even if sampled and observed, at least for most ways of sampling and observing. We might instead try to claim that if sampled and observed, they would have been in North America with us, but this move abuses counterfactual similarity a bit too much. Again though, even if this case slips through, the next condition handles it.

Methodological independence is a key part of statistical methodology. Experimenters take care that their sampling of Fs and checking them for G doesn’t alter the G-facts or destabilize the F-population. In a normal textbook on experimental design the warnings are against causal failures. Goodman’s novelty was in using semantic tricks to ensure definitional failures of dependence. If we were only concerned with the original grue and emerose cases, methodological independence would be enough. But as we’ve seen, there are other problem cases to screen out.

\(({\textrm{II}})\) Partitions. Recall that something is an emerose\(_{2}\) just in case it is either an emerald or an unobserved rose. I don’t think this slips through the methodological independence net. Even if it does though, that is fine. We can say that it is of a kind with a third variant—something is an emerose\(_{3}\) just in case it is either an emerald or a rose. This definition is not relativized to sampling or observations at all. Yet if we observe only emeralds, find them all green, then try to project “all emeroses are green”, we run into trouble in the exact same way we do with “emerose\(_{2}\)”. The difference is that with “emerose\(_{3}\)” our having an unrepresentative sample could have been avoided by observing some roses. Not so with “emerose\(_{2}\)”.

The problems here are more akin to selection bias than to the observer effect. As such, we shouldn’t expect them to be ruled out by methodological independence. Godfrey-Smith’s “emerose\(_{2}\)” case makes any sample a biased one, by definition. This is analogous to Goodman’s definitional guarantee of methodological-dependence, for “grue”. And the “emerose\(_{3}\)” case can lead to selection bias by chance, but of a kind that can still be ruled inadmissible a priori.

This and similar cases involve a non-vacuous partition of the population of Fs into \(F_{1}\) and \(F_{2}\). Since it is non-vacuous, some Fs are \(F_{1}\)s and some are \(F_{2}\)s. And since it is a partition, every F is either an \(F_{1}\) or an \(F_{2}\) and no F is both. The problems come when our sampling method \(M_{S}\) draws from only one side of the partition. With “emerose\(_{2}\)”, any sampling method does so necessarily. With “emerose\(_{3}\)”, whether this is so will depend upon features of \(M_{S}\). If the method involves only finding emeralds, then the method is problematic for “emerose\(_{3}\)”. It is problematic because it only ever draws from one side of the partition. What matters is whether there is a partition of the Fs that our sampling method invariably fails to cut across.

What condition rectifies this? We could try to require that every F can possibly be \(M_{S}\)-sampled, but that would be too weak. If some particular F could only be \(M_{S}\)-sampled in a world where the laws of physics were drastically different, that shouldn’t increase our trust in \(M_{S}\) in our world. What matters is whether each F can, in principle, be sampled via \(M_{S}\) in situations that are as similar as possible to the actual world, modulo different uses of \(M_{S}\) itself. This is to consider as actual situations where we apply our methods somewhat differently, but as far as is otherwise possible, everything else remains the same. So if our method is gathering emeralds, we consider situations where different emeralds are gathered. Say that such a possible world is a minimal \(M_{S}\)-variant of the actual world. We sample some Fs using \(M_{S}\) in the actual world, and in a minimal \(M_{S}\)-variant world, we sample some other Fs using \(M_{S}\), or sample fewer Fs using \(M_{S}\) than we actually did.

So our condition is that we need each F to be accessible to sampling method \(M_{S}\) in a minimal \(M_{S}\)-variant of the actual world:

Method \(M_{S}\) is partition independent over F if and only if for any F, there is a minimal \(M_{S}\)-variant world where that F is sampled by \(M_{S}\)

To understand this definition correctly, it is crucial to remember that minimal \(M_{S}\)-variant worlds are generally as similar as possible to the actual world, modulo applications of \(M_{S}\). Here I will assume that the focus is only on sampling methods, but observation methods could easily be included in this definition if required.

There are two different ways that this can be misunderstood. The first way of misunderstanding approaches applications of \(M_{S}\) in an entirely formal fashion. This is wrong. Instead, the condition should be interpreted as appealing to a previously understood notion of sampling methods, with applications of a given method being the relevant events in the worlds where said method is used (by us). There is some indeterminacy about whether certain very similar methods count as “the same”, but I don’t think that this is especially problematic. The intuitive notion of a sampling method is fairly coarse-grained, but not so coarse-grained that it is vacuous.

The second way to misunderstand partition independence concerns minimality. If we apply the same method, but in addition shoot all emeralds into the Sun or destroy the Earth, we are no longer in a minimal \(M_{S}\)-variant of the actual world. Since each minimal \(M_{S}\)-variant world is a possible world, this condition entails that each F can be \(M_{S}\)-sampled. But not vice-versa. Partition independence handles both modified emerose cases, as well as the nemerald case. If our sampling method is inherently limited to north america, then it is not partition independent over emeralds.

Partition independence is so-named because it entails that for any partition of the Fs, our method does not only draw from one side of the partition in worlds like ours. I think this is part of what people are striving for when they demand a fair sample. But partition independence is weaker than the above-mentioned standard notion of a “random” sample. When a sampling method is random and non-trivial, it is partition independent, but not automatically vice-versa. If needed, we could strengthen partition independence. We could order minimal \(M_{S}\)-variant worlds by similarity to the actual world. Then we could require that each F is \(M_{S}\)-sampled in a sufficiently similar minimal \(M_{S}\)-variant world. This would bring partition independence toward true randomness without automatically going all the way.

Even in its original form, partition independence is very strong. We might also want to weaken it to require only that the “overwhelming majority” of Fs can be reached by \(M_{S}\) in \(M_{S}\)-variant worlds. Otherwise a single \(M_{S}\)-inaccesible F—say one that fell into a black hole—would undermine the use of \(M_{S}\) in gathering a sample on which to base an inductive projection. That wouldn’t fit how we think about our inductive practices. We also want to be fairly liberal when considering alternative applications of our sampling methods.

This connects to the general question of whether partition independence is, even when weakened in the indicated ways, too demanding. We (seemingly) use induction to draw conclusions about objects that are distant from us in space. We also (seemingly) use induction to draw conclusions about objects that are distant from us in time, whether in the past or the future. This can involve situations in which a different population of Fs exists, compared to the currently existing Fs. Yet even if our F-population is meant to be all of the Fs that exist in our world across the past, present, and future, the general problem remains. The problem is that our actual sampling methods arguably aren’t partition independent over spatially or temporally distant Fs, unless we illicitly treat our methods as if they were magical.Footnote 31

It sounds a little strange at first, but sometimes this might be the correct result. Correct because, as I noted at the start of this section, I am here concerned with pure cases of induction. And the cases that cause trouble are all impure. In actual cases of impure induction, there is usually an implicit, background reliance on abduction or inference to the best explanation.Footnote 32 This is related to something I already noted above, and that both Moreland and Godfrey-Smith also noted: we have to distinguish induction from abduction when approaching the grue puzzle, even though Goodman himself ran them together. My goal here is to answer the new riddle of induction, pure induction. Some of our actual inductions depend on background applications of abduction.

A complete theory of non-demonstrative reasoning must provide a full treatment of these mixed cases. Here though, the crucial point is that partition independence is not too strong for pure inductive reasoning. I say this while bending over backwards to be fair to the objection.

A different reply, that I find equally plausible, simply denies that the objection is correct. This denial can be supported by noting that the correct way to understand “alternative” applications of our sampling method already implicitly builds in standard spatial and temporal variations. This arguably allows pure induction to overcome the apparent temporal and spatial limitations of our sampling methods, without any boost from abduction.

Both methodological and partition independence implicitly concern Fs that are not in our sample. This is not a problem, since no questions are begged about the G-rate over the Fs. And partition independence is really a relation between our sampling method and “F”, so it is general, rather than particular. In neither case do we build in or assume that our induction succeeds at reaching a true conclusion, so nothing problematically circular has been assumed about any particular case.

The notion of “robust independence” that I used as a covering term for the dual notion of independence has now been characterized as involving two parts:

Methods \(M_{S}\) and \(M_{O}\) are robustly independent of F and G if and only if both (i) \(M_{S}\) and \(M_{O}\) are methodologically independent of F and G and (ii) \(M_{S}\) is partition independent over F

In order to project the rate of Gness over Fs in our sample to the Fs as a whole, we need this independence condition to be satisfied. I think this covers all and only the things that people in the central Jackson-derived line wanted covered. Yet there is a third type of independence, concerning not our methods but rather the epistemic context in which inductive reasoning takes place. This is what led both Schramm and Freitag to consider counterfactual relations among our beliefs. This aspect needs to be addressed as well.

6 The broader context

The broader epistemic context of a bit of inductive reasoning actually matters in at least two different ways. The first is only independence in a very general way, the second is a third major form of dependence that independence theories have been trying to capture.

\(({\textrm{I}})\) Outside Evidence. Some of the things we know and believe interact poorly with a given inductive inference. Sometimes this happens directly. If you already know that the conclusion of an inductive inference is false, then clearly you aren’t warranted in performing said inference. Other times it is indirect. You might know about a partition of the Fs that is relevant to G-rates, and also that your sampling method drew from only one side of said partition. The general issue is that you can know something at odds with the conclusion of an inductive argument.Footnote 33

Consider an epistemic context c, in which an inductive inference, I, with the conclusion that all Fs are G, is launched:

I is unblocked in c if and only if there is no outside evidence in c suggesting that the G-rate of sampled Fs is relevantly different than the G-rate of unsampled Fs—it is blocked otherwise

I’m using “outside evidence” as a covering term. It includes evidence of all kinds. Obviously, when you already have conclusive evidence that I’s conclusion is false, I is blocked. But this definition also applies in more subtle ways.

This condition says “suggesting that” the rate is different instead of “entailing” that the rate is different. This is because I want to allow for merely probabilistic connections. I also say “relevantly different”, because—as noted in the brief discussion of “matching” at the start of the previous section—there might be slight and irrelevant differences that fall within what we would tolerate or treat as “the same” in a given context. We want to focus on cases where outside evidence suggests that the G-rate in the sample can’t appropriately be projected onto the population as a whole, given our standards in the context at hand.Footnote 34

In most situations in which we have outside evidence that the G-rates of sampled and unsampled Fs significantly differ from each other, our inductive inferences about these rates will be blocked. As an illustration, consider again the case of “emerose\(_{3}\)” and suppose our sampling method is partition independent. Still, by luck we could end up with a sample consisting only of emeralds, all of which are green. Can we then conclude that all emeroses\(_{3}\) are green? Not usually, because in most contexts we already know that roses are unlikely to be green. So the inductive inference here is blocked in most epistemic contexts. In other contexts, it might not be, which could mean that in those situations, our inductive inference from an unrepresentative sample is in epistemic good-standing. This is a feature, not a bug.

What if our outside evidence is itself inductive? Say that an “\({\textsf{fhemerald}}\)” is an emerald that is near a fire hydrant, and a “\({\textsf{fhemerose}}\)” is either a fhemerald or a rose. We observe some emeralds, none of which is an fhemerald, all of which are green. We also observe some fhemeroses, none of which are fhemeralds, none of which are green. So we have one inductive argument for the conclusion that all fhemeralds are green, and another for the conclusion that all fhemeralds are not green. Are both of the inferences blocked? We might instead restrict “outside evidence” to leave aside cases of inductive opposition. In many cases, there will be non-inductive evidence that will block one argument but not the other. But when the only relevant evidence on either side of a conflict is inductive, neither is blocked. We can’t accept both, so we would and should try to resolve the conflict by gathering more evidence. Again: this is a feature, not a bug.

\(({\textrm{II}})\) Basing. If your evidence about G is epistemically based on application of a predicate you already know you can’t project, then you can’t justifiably project G either. Freitag’s account focused on this point. But his approach only ruled out predicates whose application was based on applications of predicates you knew applied only to your sample. This doesn’t rule out nearly enough. We need to liberalize the constraint.

Both Schramm and Freitag gloss their proposals counterfactually, but I think this is too coarse-grained. The real issue is inference. If you infer that q, from—among other things—your belief that p, then the former belief is (partially) inferred from the latter. This is direct inferential dependence. If there is a chain of direct inferential dependencies between the belief that q and the belief that p, then your belief that q inferentially depends upon your belief that p. This is a one-many psychological relationship. Using it, we can say, in epistemic context c, for inductive inference, I:

I is well-based in c, if and only if I’s premises do not inferentially depend on (the premises of) an inductive argument, \(I^{*}\), that is blocked or otherwise undermined in c—it is non-well-based otherwise

This fixes both issues I raised for Freitag’s proposal at once while preserving its good features. Inference is not merely counterfactual, and the defeat condition is much less demanding. Sometimes the very same inductive argument can be well-based in one context, but non-well-based in another. This is what we want.

This condition rules out some potentially lingering problem cases. Recall the rigid reading of Scheffler’s grue\(_{2}\) case, where being one of \({\textsf{e}}_{{{\textrm{1}}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) is specified not in terms of our sample, but directly, and thus, rigidly. We believe the premises, saying that each of the emeralds \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) is grue\(_{2}\), partly on the basis of believing that each emerald \({\textsf{e}}_{{\textrm{k}}}\) is among \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\). But the inductive inference from these premises to the conclusion that each emerald is among \({\textsf{e}}_{{\textrm{1}}}\), \({\textsf{e}}_{{\textrm{2}}}\), ..., \({\textsf{e}}_{{\textrm{999}}}\) is then blocked in this context. Our evidence a priori entails that, for this G, the G-rate among unsampled Fs is zero.

If an inductive inference I is warranted in context c, then I must be both unblocked and well-based in c. But this is a necessary condition on the epistemic goodness of an inductive inference, not a sufficient one. To finally bring everything together to answer the internal and external challenges of grue, we need to trek a bit further into the wilds.

7 Our epistemological jungle

Consider again a standard, universal inductive inference:

figure g

Of course, not all inductive inferences are universal—we often move from some other percentage of the sampled Fs being G to concluding that the same percentage of the Fs, sampled or not, are G. But this complication doesn’t really change the overall epistemic situation, so I’ll stick with the universal case for simplicity.

Bringing everything together, I’ve introduced the following background conditions:

(2) Our sample of Fs (gathered using \(M_{S}\)) is large enough.

(3) Methods \(M_{S}\) and \(M_{O}\), population F, and property G are robustly independent of each other

(4) The inference from (1) to \(({\textrm{C}})\) is unblocked and well-based in the current context

Of course, as we’ve seen, the conditions (3) and (4) might both justly be treated as two distinct conditions. This won’t make a difference though.

Induction in the wild typically involves inferring (\({\textrm{C}}\)) directly from (1). The inference is not deductively valid, and nothing like premises (2) and (3) and (4) are ever made explicit outside of the statistics class or the philosophy room. What attitudes or status, if any, do reasoners need to have toward these epistemic background claims in order for the inference to be warranted? The most demanding answer is that reasoning from (1) to \(({\textrm{C}})\) is only warranted in a given context if you also know—or at least are warranted in believing—(2),  (3), and (4), in that same context. This is far too demanding. On some readings it entails that no inductive inference was ever warranted before careful statisticians and pedantic philosophers came onto the scene.

A better option is to say that the inductive inference is externally warranted just in case these background premises are true. When they are true in our world, in our current context, our statistical models tell us that the given inference is broadly reliable. There is then a sense in which the truth of the premise makes the conclusion objectively probable when (2) and (3) and (4) are all true. When this happens, things haven’t worked out just by chance. We did not just get lucky. Instead, we did something right. This is the answer to the external challenge of grue.

We are externally warranted in projecting “green”, but the grue lover is not externally warranted in projecting “grue”. As usual with external constraints, reasoners don’t even need to be able to formulate the background conditions themselves. It is the mere truth of the background conditions that warrants reasoners in accepting the conclusion on the basis of the premise, in an externalist sense. If these background conditions are false, the tie between the truth of the premise and the truth of the conclusion is significantly weakened. When this happens, your inductive reasoning gets things right, if at all, by chance. Buy a lottery ticket.

For simple creatures, the epistemic story ends here. For sophisticated creatures, it does not. We can have beliefs about the truth and falsity of (2) and (3) and (4). The birds and the bees cannot, even in principle. Still, the same story holds for all when it comes to external warrant. We don’t have to believe—let alone know or warrantedly believe—(2) and (3) and (4) to have externalist warrant for reasoning from (1) to \(({\textrm{C}})\). But our cognitive sophistication leads us into an epistemological jungle. There is something internally incoherent about reasoning from (1) to \(({\textrm{C}})\) while at the same time disbelieving any of (2), (3), or (4). This is so whether or not they are true.

Even if these background claims are false, if you believe them true, you are weakly internally warranted in reasoning from (1) to \(({\textrm{C}})\). If you warrantedly believe them, then you are strongly internally warranted in reasoning from (1) to \(({\textrm{C}})\). If you disbelieve any of them, you are weakly internally unwarranted in reasoning from (1) to \(({\textrm{C}})\). And if you warrantedly disbelieve them, you are strongly internally unwarranted in reasoning from (1) to \(({\textrm{C}})\). Grue lovers, in our world, can be weakly internally warranted in their gruesome reasoning. Some may even be strongly internally warranted. This will depend on their evidence in a given context. These internal notions of warrant answer the internal challenge of grue. And these notions themselves all split into two, depending on whether the warrant for believing the background conditions itself is external or internal. I won’t pursue all of the options and attending complications further. We are both externally and strongly internally warranted in projecting “green”. Grue lovers in our world, with our kind of evidence, are neither externally nor strongly internally warranted in projecting “grue”.

Can strong internal warrant, of a robustly internalist kind, be had without problematic circularity? I think so, at least according to many internalist epistemic theories. This might require having internally warranted beliefs about the background conditions. I think this is possible. You can certainly have a warranted belief that your methods are methodologically and partition independent, even without knowing anything about the G-rate in the unsampled Fs. As noted above, nothing about the success or rationality of the particular bit of inductive reasoning in question is thereby assumed. Warrant for general beliefs of this kind might come from previous inductive reasoning, or from previous abductive reasoning, or a mixture of both. Each case is different.

Sometimes the similarity of Fs to each other will be relevant in our reasoning, but not always. Suppose we’re studying a population of assorted objects in a large, drained swimming pool. Our sampling method involves using a gigantic net to fish the objects out. In this case we might be able to reason, from the dimensions of the pool and the features of our net, that anything in the pool can be scooped out by the net.Footnote 35 This reasoning appeals to general things we believe about the world, of course, but it is not problematically circular.

On the most demanding and thorough-going versions of epistemic internalism, this may not be enough. Yet at exactly this point we have moved from the new riddle of induction back to the old one. I don’t think it is perfectly clear what the traditional, Humean problem of induction even is, but solving some versions of the old riddle seem to require vouchsafing the rationality of induction in internalist, foundationalist terms, without any circularity. This is not something I will attempt here, but it is also not something that is required for a solution to either grue puzzle.

My independence solution covers everything covered by previous independence theories, without anything slipping past, and without leaving anything out. Perhaps devious counterexamples will be devised? Perhaps, but I suspect that any needed refinements will be further along in the directions I have indicated, so that a sharpening or refinement or combination of the types of independence I have identified will do the trick. So in this sense, I call the approach I have developed the independence solution to grue.

The independence solution stands on its own, but it is compatible with other orientations. Bayesians can use dependence to explain why gruesome hypotheses are assigned low prior probabilities, either subjectively or objectively. Natural kind lovers can point out that gerrymandering increases the risk of unexpected partition-dependence. And advocates of pragmatic approaches can adopt the account of reliability and feed it into their favored theories of cost and decision. The epistemological jungle grows thick. Some of the thickest bush surrounds grue, but a small path has been cut through. Come this way.