1 Introduction

A five-horse race is about to start. You don’t know which horse is going to win, but here are their respective winning chances:

figure l

Take a guess: Who do you think is going to win?

There are many guesses you could make. But some guesses are clearly better than others. While ‘Ajax’ and ‘Ajax, Benji, or Cody’ both seem like fine guesses, ‘Dusty’ and ‘Cody or Ember’ both seem like terrible ones. What explains the difference? What makes for a good guess? That is the question I want to address in this paper.

There are at least two reasons for being interested in this question. First, as we will see, good guesses turn out to exhibit some striking patterns that call out for explanation in their own right. Second, the notion of guessing has itself been put to explanatory use in recent attempts to shed light on the cognitive attitudes of thinking and believing (Hawthorne et al., 2016; Holguín, 2022). I won’t here take a stance on whether these deployments of the notion of guessing are apt. But if the notion of guessing is to do substantive work in epistemology or elsewhere, it would at least be good to have a basic understanding of what makes for a good guess.

Perhaps the most systematic attempt to develop an account of good guesses comes from a recent paper by Dorst and Mandelkern (forthcoming), who argue that good guesses are distinguished from bad ones by how well they optimize a tradeoff between accuracy and specificity. As we will see, there is much to recommend their proposal. Their central idea, that good guesses optimize a tradeoff between accuracy and specificity, seems to me a promising one. Nonetheless, I will argue that their implementation of this idea fails to satisfy some plausible constraints on good guesses (Sects. 23), and I will develop an alternative implementation that satisfies the relevant constraints (Sect. 4). The result will be a new account of good guesses which retains the positive aspects of Dorst and Mandelkern’s proposal, but without the drawbacks.

2 Dorst and Mandelkern on good guesses

Let us start by asking: what do we want from our guesses? We obviously want them to be correct—we want to make accurate guesses. But accuracy cannot be all we care about. After all, it is easy to ensure that your guess is correct: just guess the disjunction of every possible outcome! In the horse race scenario, this would amount to guessing ‘Some horse will win.’ That’s clearly not a good guess. We also want to take a stand on things—to make a specific guess, one that narrows down the set of possibilities under consideration.Footnote 1 These goals directly compete: the more specific your guess is, the less likely it is to be accurate; and the less specific it is, the more likely it is to be accurate. Thus, we face a tradeoff: we must strike a balance between making guesses that are likely to be accurate, but not too unspecific; specific, but not too unlikely to be accurate. In making this tradeoff, some agents might be more or less “risk-averse” than others. Someone who is risk-averse (in this sense) will place more weight on accuracy than specificity, whereas someone who is risk-taking will place more weight on specificity than accuracy. And what constitutes a good guess depends on how risk-averse or risk-taking you are: a good guess is one that optimizes your preferred accuracy-specificity tradeoff.

This summarizes the basic idea behind Dorst and Mandelkern’s proposal. My concern here is how this idea is best made precise, so I’ll will begin by taking a closer look at the details of Dorst and Mandelkern’s account. We begin with a set of possibilities, W, which we think of as the set of scenarios or “possible worlds” that are compatible with what the agent is certain of. A question, Q, is a partition of W: a set of mutually exclusive and exhaustive subsets of W.Footnote 2 The cells of the partition are the complete answers to Q. The size of Q, denoted |Q|, is the number of complete answers to Q. The agent’s credences are represented by a probability function, P, which is defined over the subsets of W. Since the agent is certain of W, and since the complete answers form a partition of W, the agent is certain that exactly one of the complete answers is correct.

To illustrate these definitions, let us they apply them to the horse race example. The question under discussion is: “Which horse is going to win?” There are five complete answers: {Ajax, Benji, Cody, Dusty, Ember}. The size of the question is: |Q| = 5. And the probability distribution over the complete answers is: {Ajax: 60%, Benji: 18%, Cody: 14%, Dusty: 6%, Ember: 2%}.

That is the basic setup. Next, we need to make precise the idea that good guesses optimize an agent’s preferred accuracy-specificity tradeoff. To this end, Dorst and Mandelkern assign to each answer, p, an answer-value, V(p), which represents how good it would be to guess p in response to the question under discussion. Since we want our guesses to be accurate, the answer-value of p depends, in part, on whether p is true. So, whenever you’re uncertain about whether p is true, you will be uncertain of p’s answer-value. Still, you can form an expectation of p’s answer-value. Let V+(p) be p’s answer-value if true, and let V(p) be p’s answer-value if false. The expected answer-value of p is then given by the following weighted average:

$$ E(p) = P(p)\cdot V^{ + } (p) + P(\sim p)\cdot V^{ - } (p). $$

According to Dorst and Mandelkern, this is the quantity one should aim to maximize when forming one’s guess. That is, Dorst and Mandelkern propose the following norm of guessing:

  • Guessing as maximizing: A guess is permissible iff no other guess has a higher expected answer-value.

Before we can derive any predictions from this norm, we need to say more about what determines the answer-value of a guess. Recall that the answer-value of a guess is supposed to depend on two factors: its accuracy and its specificity. To capture the accuracy-component, Dorst and Mandelkern require that V be truth-directed:

  • Truth-directedness: V+(p) > V(p).

This is just to say that true guesses are better than false ones. In addition, Dorst and Mandelkern make the simplifying assumption that false guesses never have any positive or negative answer-value (i.e., V(p) = 0), and hence that true guesses always have a positive answer-value (i.e., V+(p) > 0).Footnote 3 This allows us to simplify the expression for the expected answer-value of a guess as follows:

$$ E\left( p \right) = P\left( p \right)\cdot V^{ + } \left( p \right). $$

Since V(p) has dropped out of the equation, I will henceforth refer to V+(p) simply as “the answer-value of p.”

So much for the accuracy-component. What about the specificity-component? Just as true guesses are better than false ones, specific guesses are better than unspecific ones. So, V+ should not only be truth-directed, but also specificity-directed:

  • Specificity-directedness: V+(p) is an increasing function of p’s specificity.

To make this requirement precise, we obviously need to say more about what determines the specificity of a guess—we need a measure of specificity. What should such a measure look like? We have said that the specificity of a guess is supposed to tell us something about how far the guess “narrows down” the set of possibilities under consideration. And how far a guess can narrow down the set of possibilities under consideration depends on the size of the question under consideration: if the size of the question is quite large—i.e., if the question divides up the set of possibilities in a quite fine-grained way—there is a potential to rule out a large number of possibilities, whereas if the size of the question is quite small—i.e., if the question divides up the set of possibilities in a quite coarse-grained way—there is only a potential to rule out a small number of possibilities. To capture this dependency, Dorst and Mandelkern propose to measure the specificity of a guess in terms of the proportion of complete answers it rules out:

  • Dorst–Mandelkern specificity: The Dorst–Mandelkern specificity of a guess, p, in response to a question, Q, is given by:

    $$S_{{{\text{DM}}}} (p) = \frac{{|Q| - c_{p} }}{{|Q|}},$$

    where cp is the number of complete guesses, which are compatible with p: cp = |{q\(\in\)Q: p ∩ q ≠ \(\emptyset \)}|.

If, for example, the question is “Which horse is going to win?,” then SDM(Ajax) = 4/5, SDM(Benji or Dusty) = 3/5, SDM(Some horse will win) = 0, and so on.

Note that SDM(p) is a decreasing function of cp, and an increasing function of |Q|: when cp = |Q| (i.e., when p includes every complete answer), SDM(p) = 0, and when cp = 1 (i.e., when p is itself a complete answer), SDM(p) = (|Q| − 1)/|Q|, which is a quantity that approaches 1 as |Q| approaches infinity. Hence, Dorst and Mandelkern’s specificity measure faithfully captures the idea that a specific guess is one that narrows down the space of possibilities under consideration, and that how far a guess can narrow down the space of possibilities depends on the size of the question under consideration. Later on, I will raise some concerns about their measure, and I will propose an alternative measure of specificity instead. But before I turn to the critical part of the paper, I want to introduce the rest of Dorst and Mandelkern’s account.

The final task is to capture the idea that different agents might have different preferred accuracy-specificity tradeoffs. To this end, Dorst and Mandelkern introduce a parameter, J, which is supposed to represent the degree to which an agent values specificity relative to accuracy: the higher the value of J, the more the agent values specificity relative to accuracy.Footnote 4 Thus, the answer-value of p should be an increasing function of J, just as it should be an increasing function of p’s specificity. From a purely mathematical point of view, there are many different measures of answer-value that satisfy this constraint, but Dorst and Mandelkern opt for the following measure (for reasons I will review shortly):

  • Dorst–Mandelkern answer-value: The Dorst–Mandelkern answer-value of a guess, p, in response to a question, Q, is given by:

    $$ V_{{{\text{DM}}}}^{ + } (p) = J^{{S_{{{\text{DM}}}} (p)}} = J^{{(|Q| - c_{p} )/|Q|}} , $$

    where J ≥ 1.

So, for example, if the question is “Which horse is going to win?,” \({{V}}_{\text{DM}}^{+}(Ajax)\) = J1/5, \({{V}}_{\text{DM}}^{+}(\textit{Benji\,or\,Dusty})\) = J2/5, \({{V}}_{\text{DM}}^{+}(\textit{Some\,horse\,will\,win})\) = J, and so on. Note that \({V}_{\text{DM}}^{+}(p)\) is indeed an increasing function of both J and p’s specificity (except when J = 1, in which case \({{V}}_{\text{DM}}^{+}(p)\) = 1 regardless of the value of \({{S}}_{\text{DM}}(p)\), which reflects the fact that the agent places no weight at all on specificity).

Given this measure of answer-value, the expected answer-value of a guess becomes:

$$ E_{{{\text{DM}}}} \left( p \right) = P\left( p \right) \cdot J^{{S_{{{\text{DM}}}} (p)}} P\left( p \right) \cdot J^{{(|Q| - c_{p} )/|Q|}} . $$

What Dorst and Mandelkern’s account says, then, is that p is a permissible guess iff there is no other guess, q, such that EDM(q) > EDM(p).

To illustrate how their account works, consider again the horse race example:

Which guess has the highest expected Dorst–Mandelkern answer-value? It depends on the J-value: given a low enough J-value, the best guess will simply be the most probable answer, which is always going to be the entire disjunction (‘Some horse will win’). Given a high enough J-value, the best guess will be the most probable complete answer (‘Ajax’). And for intermediate J-values, the best guess will be either ‘Ajax or Benji’, ‘Ajax, Benji, or Cody’, or ‘Ajax, Benji, Cody, or Dusty’, in order of decreasing J-value (as illustrated by Fig. 1).

Fig. 1
figure 1

Expected Dorst–Mandelkern answer-value of various horse race guesses, depending on the J-value

Why think this is the right way of implementing the idea that good guesses optimize an agent’s preferred accuracy-specificity tradeoff? Dorst and Mandelkern helpfully address this question by considering various putative constraints on good guesses, which their account satisfies. Here is one such constraint (which Dorst and Mandelkern attribute to Holguín 2022, §10):

  • Filtering: A permissible guess is always filtered: if it includes a complete answer, p, it must include all complete answers that are more probable than p.

The motivation behind Filtering is straightforward: if p isn’t filtered, there must exist two complete answers, a and a’, such that (i) a is less probable than a′, and (ii) p includes a, but not a′. We can then define an alternative guess, p′, which is identical to p except that it includes a′ instead of a. Given this, p and p′ are equally specific (they include the same number of complete answers), but p′ is more probable than p, which means that p′ is a better guess than p regardless of one’s preferred accuracy-specificity tradeoff.

It’s worth noting that Filtering can be equivalently formulated as followsFootnote 5:

  • No accuracy-dominance: A permissible guess is never accuracy-dominated, where p accuracy-dominates p′ iff cp ≤ cp’ and P(p) > P(p′).

This formulation of Filtering makes it even clearer why it must hold: it follows immediately from the idea that accurate guesses are, all else being equal, better than inaccurate ones. Furthermore, it makes salient a different constraint, which Dorst and Mandelkern do not explicitly discuss, but which their account satisfies:

  • No specificity-dominance: A permissible guess is never specificity-dominated, where p specificity-dominates p′ iff cp < cp′ and P(p) ≥ P(p′).

Like No Accuracy-Dominance, this constraint is easy to motivate: it follows immediately from the idea that specific guesses are, all else being equal, better than unspecific ones. This also means that neither constraint offers much help in deciding between different measures of answer-value: all they tell us is that our measure of answer-value must be truth-directed and specificity-directed.

However, Dorst and Mandelkern consider an additional constraint (also from Holguín 2022, §10) which places considerably more stringent demands on a measure of answer-value:

  • Optionality: For any question, Q, and any k: 1 ≤ k ≤ |Q|, some accuracy-specificity tradeoff renders it permissible to guess the disjunction of exactly k complete answers.

We can think of this constraint as saying that it is “up to the agent” to decide how many complete answers they want to include in their guess. For example, in the horse race scenario, you may include anywhere from one to five horses in your guess, depending on your preferred accuracy-specificity tradeoff. As we have seen, this is precisely what Dorst and Mandelkern’s account predicts (see again Fig. 1). That is no accident: as Dorst and Mandelkern show, their account satisfies Optionality in full generality.

Why think that an account of good guesses should satisfy Optionality? Dorst and Mandelkern’s main motivation—which can also be found in Holguín (2022, §10)—comes from reflecting intuitively on the kinds of answers one might sensibly give in response to a question like “Which horse is going to win?”. It clearly makes sense to guess the outright favorite ‘Ajax’. But if you prefer to take a less opinionated stance, it could also make sense to make a disjunctive guess like ‘Ajax or Benji’ or ‘not Ember’. And if you really don’t want to stick your neck out, it could even make sense to answer ‘Some horse will win’ by reference to the fact that all five horses have a non-zero chance of winning.

It seems, then, that the norms of guessing are quite permissive: they don’t force us to include any particular number of complete answers in our guess. This is what Optionality is supposed to capture. Eventually, I will offer some reasons to doubt that something as strong as Optionality holds without exception. But at this point in the dialectic, I just want to acknowledge that Optionality has a good deal of intuitive appeal, which offers at least some prima facie support for Dorst and Mandelkern’s account.

3 Problems for Dorst and Mandelkern’s account

I now want to raise some concerns about Dorst and Mandelkern’s proposal. The concerns center around two putative constraints on good guesses, which I think an account of good guesses should satisfy, and which Dorst and Mandelkern’s account turns out not to satisfy. Below I introduce each constraint, explain why I find it attractive, and show how Dorst and Mandelkern’s account violates it. This will then lead to the positive proposal of the paper, which I will present in Sect. 4.

3.1 Neutrality

To illustrate the first concern, let us begin by considering a simple coin flip:

figure n

Compare the guesses ‘Heads’ (‘H’) and ‘Heads or Tails’ (‘H v T’). Which is better? It clearly depends on your preferred accuracy-specificity tradeoff: if you value accuracy to a high enough degree, ‘H v T’ will be the better guess, and if you value specificity to a high enough degree, ‘H’ will be the better guess. In other words, for low enough J-values, ‘H v T’ will be the better guess, and for high enough J-values, ‘H’ will be the better guess. Given this, we should expect that, if we set J at just the right intermediate value, ‘H’ and ‘H v T’ will be tied: neither guess will be better than the other. This is exactly what Dorst and Mandelkern’s account predicts: ‘H’ and ‘H v T’ have the same expected Dorst–Mandelkern answer-value iff J = 4.Footnote 6 So far, no problem.

But now suppose we replace the coin with a six-sided die:

figure o

Let us again compare one of the maximally specific guesses, say ‘1’, to the entire disjunction ‘1 v 2 v 3 v 4 v 5 v 6’. Which is better? When J = 4, it turns out that ‘1’ has a lower expected Dorst–Mandelkern answer-value than ‘1 v 2 v 3 v 4 v 5 v 6’.Footnote 7 In other words, Dorst and Mandelkern’s account predicts that when you are indifferent between ‘H’ and ‘H v T’, you won’t be indifferent between ‘1’ and ‘1 v 2 v 3 v 4 v 5 v 6’. And you won’t be indifferent between the various intermediate guesses (‘1 v 2’, ‘1 v 2 v 3’, and so on) either: when you are indifferent between ‘H’ and ‘H v T’, you will consider ‘1 v 2 v 3 v 4’ to be the uniquely best guess in the die roll case (see Fig. 2 for an illustration).Footnote 8

Fig. 2
figure 2

Expected Dorst–Mandelkern answer-value of various die roll guesses (setting J = 4)

This strikes me as an odd result. There doesn’t seem to be anything unnatural about being indifferent between ‘H’ and ‘H v T’ while also being indifferent between ‘1’, ‘1 v 2 v 3 v 4 v 5 v 6’, and the various intermediate guesses in the die roll case (‘1 v 2’, ‘1 v 2 v 3’, and so on). On the contrary, it seems possible to justify such a guessing pattern by reference to the fact that when the probability distribution over the complete answers is flat, there is a sense in which you have nothing to go on when forming your guess—your guess won’t be informed by anything. Of course, you might still prefer some guesses over others on the grounds that you are risk-averse or risk-seeking. But it’s natural to think that there should be a “risk-neutral” perspective from which all guesses look equally good or bad when the probability distribution is flat.

If this isn’t already clear enough, the following decision-theoretic analogy may help drive home the point. Suppose you are offered the following bets by a fair bookie who assigns equal probability to ‘Heads’ and ‘Tails’ respectively:

figure p

If you had to take one of these bets, which one would you pick? It clearly depends on how risk-averse or risk-taking you are: if you are risk-averse, you will prefer to bet on ‘H v T’, and if you are risk-taking, you will prefer to bet on ‘H’. But if you are risk-neutral, you will be indifferent between the bets, since each bet has an expected value of $1.

The same goes for the following set of bets:

figure q

Again, which bet you will prefer depends on how risk-averse or risk-taking you are: if you are sufficiently risk-averse, you will prefer to bet on the entire disjunction ‘1 v 2 v 3 v 4 v 5 v 6’, and if you are sufficiently risk-taking, you will prefer to bet on ‘1’. But if you are risk-neutral, you will be indifferent between the bets, since each bet has an expected value of $1.

It is natural, then, to suppose that there exists a risk-neutral perspective from which all of the above bets look equally good. And although I don’t want to make any general claims about the relationship between guessing and betting, I submit that guessing is similar to betting in this respect: it is very natural to suppose that there exists a risk-neutral perspective from which all guesses look equally good when the probability distribution is flat.

If this is right, then an account of good guesses should satisfy the following constraint:

  • Neutrality: Some accuracy-specificity tradeoff is “risk-neutral” in the sense that, for any question, Q, if the probability distribution over the complete answers to Q is flat (that is, if each of the complete answers has a probability of 1/|Q|), all guesses, whether complete or disjunctive, have the same expected answer-value.

Yet, as we have seen, Dorst and Mandelkern’s account fails to satisfy Neutrality. This is the first concern I wanted to raise.

Before we move on to the second concern, it’s worth pausing to consider whether something stronger than Neutrality might hold. We have just said that there exists a neutral accuracy-specificity tradeoff when the probability distribution is flat. Given this, it is natural to wonder whether there might always exist a neutral accuracy-specificity tradeoff, even when the probability distribution isn’t flat. Might there exist an accuracy-specificity tradeoff relative to which all guesses are equally good, regardless of what the probability distribution looks like?

I think the answer to this question is negative. Consider the following version of the horse race example:

figure r

Suppose you are indifferent between ‘Ajax’ and ‘Ajax or Benji’. What does this tell us about your preferred accuracy-specificity tradeoff? Presumably, it tell us that you are not vastly more concerned with accuracy than specificity, or else you would have preferred ‘Ajax or Benji’ to ‘Ajax’. Conversely, it tells us that you are not vastly more concerned with specificity than accuracy, or you would have preferred ‘Ajax’ to ‘Ajax or Benji’. Instead, you prefer a fairly balanced tradeoff between accuracy and specificity. We can then ask: given your preferred accuracy-specificity tradeoff, how will you rate ‘Ajax, Benji or Cody’ in comparison to ‘Ajax or Benji’? Presumably, you won’t think that the very modest boost in accuracy (5 percentage points) makes up for the relatively significant loss of specificity.Footnote 9 In other words, you will consider ‘Ajax, Benji or Cody’ to be a worse guess than ‘Ajax or Benji’, and hence a worse guess than ‘Ajax’ as well. So if you are indifferent between ‘Ajax’ and ‘Ajax or Benji’, it looks like you won’t be indifferent between ‘Ajax’, ‘Ajax or Benji’ and ‘Ajax, Benji or Cody’.Footnote 10

Of course, this doesn’t show that the probability distribution must be flat for there to exist an accuracy-specificity tradeoff relative to which all answers are equally good. But it shows that we shouldn’t in general expect such a neutral accuracy-specificity tradeoff to exist unless the probability distribution is flat.

3.2 Independence of irrelevant alternatives (for guessing)

The second concern I want to raise for Dorst and Mandelkern’s proposal can be illustrated with another simple example. Suppose a golf tournament is about to start. As so often before, Tiger Woods is the heavy favorite, followed by Phil Mickelson at a distant second:

figure s

Compare the guesses ‘Woods’ and ‘Woods or Mickelson’. Which is better? It’s clear that ‘Woods’ is a good deal more specific than ‘Woods or Mickelson’ while being only slightly less probable. So if you place at least a bit of weight on specificity, you will presumably consider ‘Woods’ to be a better guess than ‘Woods or Mickelson’. This is precisely what Dorst and Mandelkern’s account predicts: ‘Woods’ has a higher expected Dorst–Mandelkern answer-value than ‘Woods or Mickelson’ iff J > 1.03.Footnote 11 So far, no problem.

But now let us suppose that instead of lumping all of the competitors to Woods and Mickelson together into one category, ‘Other’, we separate them out (let us say that there are n such players):

figure a

Effectively, we have made the option ‘Other’ more fine-grained. Should such fine-graining make a difference to whether ‘Woods’ is a better guess than ‘Woods or Mickelson’? According to Dorst and Mandelkern’s account, the answer turns out to be ‘yes’. Their account predicts that as we increase the value of n—that is, as we increase the number of competitors—‘Woods or Mickelson’ eventually overtakes ‘Woods’ as the best guess.

To illustrate this, let us suppose that J = 5. Given this, we have seen that ‘Woods’ has a higher expected Dorst–Mandelkern answer-value than ‘Woods or Mickelson’ when n = 1 (this corresponds to the original scenario in which ‘Other’ is the only other option). But once we start increasing the value of n, ‘Woods or Mickelson’ eventually overtakes ‘Woods’ (as shown in Fig. 3). The same thing can be shown to happen regardless of how high we choose the J-value to be. That is to say, no matter how much you value specificity, ‘Woods or Mickelson’ will eventually overtake ‘Woods’, once the number of competitors becomes sufficiently high.Footnote 12

Fig. 3
figure 3

Expected Dorst–Mandelkern answer-value of ‘Woods’ and ‘Woods or Mickelson’, depending on n (setting J = 5 as an illustration)

Again, this strikes me as an odd result. We have just said that ‘Woods’ looks considerably more specific than ‘Woods or Mickelson’ while being only slightly less probable. So if you value specificity to a high enough degree, it would seem perfectly natural to guess ‘Woods’ rather than ‘Woods or Mickelson’, regardless of how many other players happen to share the residual 1% of the probability mass.

I want to suggest that this is an instance of a more general problem. The more general problem is that, on Dorst and Mandelkern’s account, whether one guess is better than another depends on which other guesses are available. In other words, Dorst and Mandelkern’s account fails to satisfy the following constraint:

  • Independence of irrelevant alternatives (for guessing): If p is a better guess than q relative to Q, then for any question Q* such that (i) the set of complete answers to Q which are compatible with p is identical to the set of complete answers to Q* which are compatible with p, and (ii) the set of complete answers to Q which are compatible with q is identical to the set of complete answers to Q* which are compatible with q, then p is a better guess than q relative to Q*.

Put differently, whether one guess is better than another relative to a given accuracy-specificity tradeoff should only depend on the probabilities of the guesses and the number of complete answers with which they are compatible, not on the size of the question. This constraint mirrors a more familiar principle from rational choice theory—call it “IIA for Preferences” to distinguish it from “IIA for Guessing”—which says that whether one option is preferable to another shouldn’t depend on which other options are available. For example, according to IIA for Preferences, whether you prefer Créme Brûlée to Tarte Tatin shouldn’t depend on whether there is also Mille-feuille on the dessert menu.Footnote 13 Similarly, according to IIA for Guessing, whether one guess is better than another shouldn’t depend on which other guesses are available. For example, whether ‘Woods’ is a better guess than ‘Woods or Mickelson’ shouldn’t depend on how many other players happen to share the residual 1% of the probability mass.

Like many who have found IIA for Preferences to be an intuitive constraint on rational preferences, I find IIA for Guessing to be an intuitive constraint on good guesses. This is not to say that the two principles stand or fall together. Indeed, one of the most common challenges to IIA for Preferences does not, I think, have similar force against IIA for Guessing. Consider the following putative counterexample to IIA for Preferences, due to Sen (1993, p. 501):

  • Polite cake-eating: You are choosing between slices of cake. You are trying to choose as large a slice as possible, subject to not choosing the very largest (you don’t want to appear greedy). Accordingly, when choosing from the menu {Small, Medium}, you prefer the small slice over the medium slice, but when choosing from the menu {Small, Medium, Large}, you prefer the medium slice over the small one.

As has often been pointed out, it is unclear whether a case like this really constitutes a genuine counterexample to IIA for Preferences, or whether it just shows that we need to be careful about how to individuate the available options.Footnote 14 But however this may be, the parallel challenge to IIA for Guessing is much less compelling. Consider the following analogue of Sen’s example:

  • Cheap guesses: You are guessing which horse is going to win the race. You are trying to choose a likely winner, subject to not picking the very biggest favorite (you consider that to be “too cheap”). Accordingly, when choosing from {Cody, Benji}, you prefer guessing ‘Cody’ over ‘Benji’, but when choosing from {Cody, Benji, Ajax}, you prefer guessing ‘Benji’ over ‘Cody’.

Although this case has the surface structure of a counterexample to IIA for Guessing, it seems clear that you are not really expressing your best guess here. Rather, you are performing a kind of speech act which may be influenced by various pragmatic factors having nothing to do with the accuracy or specificity of your guess (say, an aversion against making guesses that are hard to pronounce, or an aversion against making guesses that remind you of your ex-partner).Footnote 15

There is, however, a different way one might try to push back against IIA for Guessing. Consider the following variation on the golf case:

figure b

Again, we want to compare ‘Woods’ and ‘Woods or Mickelson’, for different values of n. For concreteness, let us focus on n = 1 vs. n = 100:

figure c

Suppose you consider ‘Woods’ to be a better guess than ‘Woods or Mickelson’ when n = 1. We can then ask: should you also consider ‘Woods’ to be a better guess than ‘Woods or Mickelson’ when n = 100? According to IIA for Guessing, the answer is ‘yes’ (assuming that your preferred accuracy-specificity tradeoff doesn’t change). But you might think that it can be reasonable to guess ‘Woods or Mickelson’ when n = 100, even if you prefer to guess ‘Woods’ when n = 1. After all, when n = 100, Tiger Woods and Phil Mickelson “stand out” as the two clear favorites in a way that they don’t when n = 1. In other words, there is a sense in which ‘Woods and Mickelson’ becomes an increasingly salient guess as n increases. Given this, it might seem natural to switch guesses from ‘Woods’ to ‘Woods or Mickelson’ as n increases from 1 to 100.

Although this way of thinking has some intuitive appeal, I find it difficult to uphold the intuition on reflection. Imagine that you and your friend are about to watch a golf tournament featuring three professional players: Woods (50%), Mickelson (40%), and McIlroy (10%). During the warm-up session, your friend asks: “Who do you think is going to win?” You answer: “Woods.” A moment later, a sad announcement is made: McIlroy is forced to withdraw due to a knee injury. Instead, the organizers have decided to let 100 amateur golfers compete, each with a 0.1% chance of winning. In light of these changes, your friend asks again: “Who do you now think is going to win?” You answer: “Woods or Mickelson.”

This strikes me as an odd response; and the reason why it strikes me as an odd response, I think, is that it’s hard for me to see how it could matter whether the residual 10% of the probability mass is distributed across a hundred players or just one, even if this makes a difference to how clearly Woods and Mickelson stand out as the two biggest favorites. Perhaps not everyone will share this intuition. I certainly don’t purport to have given a decisive argument for IIA for Guessing here. But I think enough can be said in its favor to make it worthwhile exploring whether we can formulate an account of good guesses that satisfies it.

4 An alternative proposal

I now want to offer an alternative account of good guesses, which avoids the concerns raised. The account is in many ways congenial to Dorst and Mandelkern’s account: it also says that good guesses optimize a tradeoff between accuracy and specificity. But the underlying measure of specificity is different. Rather than measuring specificity in terms of the proportion of complete answers a guess rules out, I propose to measure specificity in terms of the following log-ratio:

  • Specificity: The specificity of a guess, p, in response to a question, Q, is given by:

    $$ S(p) = \log \frac{{|Q|}}{{c_{p} }},$$

    where cp is the number of complete guesses which are compatible with p: cp = |{q ∈ Q: p ∩ q ≠ \(\emptyset \)}|.

Before we explore the consequences of adopting this measure of specificity, I want to highlight some of its properties.

The first thing to observe is that S(p) is a decreasing function of cp, and an increasing function of |Q|: when cp = |Q|, S(p) = 0, and when cp = 1, S(p) = log(|Q|), which is a quantity that approaches infinity as |Q| approaches infinity. In this respect, our new specificity measure is similar to that of Dorst and Mandelkern: both measures faithfully capture the idea that a specific guess is one that narrows down the space of possibilities under consideration, and that how far a guess can narrow down the space of possibilities depends on the size of the question under consideration.

The second thing to observe is that our new specificity measure bears resemblance to Shannon’s (1948) classic measure of information, according to which the amount of information contained in a proposition, p, is given by log(1/P(p)). When the probability distribution over the complete answers is flat—that is, when each complete answer has a probability of 1/|Q|—the probability of p is given by cp/|Q|, which means that the specificity of p reduces to S(p) = log(1/P(p)). So when the probability distribution is flat, the specificity of a proposition is identical to its Shannon information.Footnote 16

This also means that our new measure of specificity inherits certain features of Shannon’s measure of information. For example, S(p) has no finite upper bound: it ranges from 0 to log(|Q|), which is a quantity that approaches infinity as |Q| approaches infinity. By contrast, Dorst and Mandelkern’s measure of specificity does have a finite upper bound: it ranges from 0 to (|Q| − 1)/|Q|, which is a quantity that approaches 1 as |Q| approaches infinity. Although I won’t place much weight on this difference, it seems to me that, insofar as we have an intuitive grip on the notion of specificity, there should be no upper limit on how specific a guess can in principle be, just as there is no upper limit on how informative a proposition can in principle be. But as I said, I won’t rest anything much on this point. The proposed measure of specificity will earn its keep by helping us solve the problems raised in the previous section.

With our new specificity measure in hand, we can define the answer-value of a guess as follows:

  • Answer-value: The answer-value of a guess, p, in response to a question, Q, is given by:

    $$ V^{ + } \left( p \right) = J^{S(p)} = J^{\log (\left| Q \right|/c_p)} , $$

    where J ≥ 1.

Accordingly, the expected answer-value of p becomes:

$$ E\left( p \right) = P\left( p \right) \cdot J^{S(p)} = P\left( p \right) \cdot J^{{{\text{log}}\left( {\left| Q \right|/c_p} \right)}} $$

This is the quantity that, I propose, one should try to maximize when forming one’s guess: a guess, p, is permissible iff there is no alternative guess, q, such that E(q) > E(p).

How does this account compare to that of Dorst and Mandelkern? Both accounts satisfy No Accuracy-Dominance and No Specificity-Dominance, since both accounts are based a on truth-directed and specificity-directed measure of answer-value. But unlike Dorst and Mandelkern’s account, our new account satisfies Neutrality and IIA for Guessing as well, whereas it doesn’t satisfy Optionality. Let us consider each of these constraints in turn.

The following result shows that our new account satisfies Neutrality (see Appendix A for a proof):

  • Point of neutrality: Let p and q be arbitrary guesses in response to a question, Q, and suppose that the probability distribution over the complete answers to Q is flat. Then E(p) = E(q) iff J = e.

We can think of J = e as the “point of neutrality” from which one can become more risk-averse or risk-seeking by lowering or raising the J-value.Footnote 17 For example, in the die roll case, the best guess will be the entire disjunction ‘1 v 2 v 3 v 4 v 5 v 6’ when J < e; the six complete answers (‘1’, ‘2’, ‘3’, etc.) will tie for best when if J > e; and all guesses will be equally good when J = e (see Fig. 4 for an illustration).

Fig. 4
figure 4

Expected answer-value of various die roll guesses, depending on J

Next, consider IIA for Guessing. We have seen that Dorst and Mandelkern’s account violates IIA for Guessing in cases like the following:

figure t

Even if ‘Woods’ starts out as a better guess than ‘Woods or Mickelson’ for low values of n, Dorst and Mandelkern’s account implies that, as n increases, ‘Woods or Mickelson’ eventually overtakes ‘Woods’. By contrast, our new account implies that if ‘Woods’ starts out as a better guess than ‘Woods or Mickelson’ for low values of n, it remains better no matter how much we increase the value of n (as illustrated by Fig. 5). This can be shown to hold in full generality: whether one guess has a higher expected answer-value than another depends only on the probabilities of the guesses and the number of complete answers with which they are compatible, as required by IIA for Guessing (see Appendix B for a proof).

Fig. 5
figure 5

Expected answer-value of ‘Woods’ and ‘Woods or Mickelson’, depending on n (setting J = 5 as an illustration)

Finally, consider Optionality. Like Dorst and Mandelkern’s account, our new account predicts a fairly high degree of optionality when it comes to deciding how many complete answers to include in one’s guess. Here is a simple example:

figure u

On our new account, each of the three filtered guesses (‘Ajax’, ‘Ajax or Benji’, and ‘Ajax, Benji or Cody’) can be made to have the highest expected answer-value by choosing the right J-value: ‘Ajax’ is best when J < 1.73; ‘Ajax, Benji or Cody’ is best when J > 1.97; and ‘Ajax or Benji’ is best otherwise (see Fig. 6).

Fig. 6
figure 6

Expected answer-value of ‘Ajax’, ‘Ajax or Benji’, and ‘Ajax, Benji or Cody’, depending on J, where the probability distribution is {Ajax: 50%, Benji: 30%, Cody: 20%}

However, unlike Dorst and Mandelkern’s account, our new account doesn’t satisfy Optionality in full generality. Consider what happens if we change the probability distribution in such a way that Benji and Cody have almost the same winning chances:

figure v

On Dorst and Mandelkern’s account, it will still be possible to pick a J-value relative to which ‘Ajax or Benji’ is a permissible guess.Footnote 18 But no such J-value exists on our new account.Footnote 19 Why not? The mechanism is illustrated in Fig. 7: for low J-values, ‘Ajax, Benji or Cody’ is a better guess than both ‘Ajax’ and ‘Ajax or Benji’ in virtue of having a higher probability. But once we start increasing the J-value, ‘Ajax’ and ‘Ajax or Benji’ both eventually overtake ‘Ajax, Benji or Cody’ in virtue of being more specific. And, as it turns out, ‘Ajax’ closes the gap to ‘Ajax, Benji or Cody’ more quickly than does ‘Ajax or Benji’. As a result, ‘Ajax or Benji’ never gets to be the best guess: it always loses to either ‘Ajax’ or ‘Ajax or Benji’ (or both).

Fig. 7
figure 7

Expected answer-value of ‘Ajax’, ‘Ajax or Benji’, and ‘Ajax, Benji or Cody’, depending on J, where the probability distribution is {Ajax: 50%, Benji: 26%, Cody: 24%}

Are such violations of Optionality a feature or a bug of the proposed account? Like Dorst and Mandelkern, I find it very plausible that an account of good guesses should be quite permissive when it comes to deciding how many complete answers to include in one’s guess. But I doubt that something as strong as Optionality holds in full generality. Consider what happens if we further change the horse race example so that Benji and Cody end up with exactly the same winning chances:

figure w

As Dorst and Mandelkern themselves observe, there is something rather odd about guessing ‘Ajax or Benji’ in such a case. What explains this? Dorst and Mandelkern’s suggestion is this: although it is permissible to guess ‘Ajax or Benji’, people tend to avoid making guesses that crosscut “clusters” of complete answers with similar probabilities. So, for example, people tend to prefer J-values that allow them to guess either ‘Ajax’ or ‘Ajax, Benji or Cody’ in a case like that above.

This is perhaps a natural enough explanation of the intuitive oddness of guessing ‘Ajax or Benji’. But another natural explanation, it seems to me, is that guessing ‘Ajax or Benji’ is indeed prohibited, as predicted by our new account. I’m not sure how to adjudicate between these rival explanations in isolation from other supporting considerations, such as those laid out previously in the paper. But it seems to me that the intuitive case for Optionality is much less compelling than one might have initially thought.

There is, however, a residual worry that one might have about the violations of Optionality engendered by our new account. Consider a schematic version of the horse race example just discussed (where 0.25 < x < 0.5):

figure x

As we have seen, our new account violates Optionality when x = 26%, but not when x = 30%. Given this, we should expect that there exists a threshold, t, such that Optionality is violated iff x < t. This is precisely what we find: in the case at hand, the threshold turns out to be t 0.274.Footnote 20 But what could explain the existence of such a seemingly arbitrary cut-off? That is the worry.

I think the worry can be at least partly assuaged by observing that the threshold in question is not introduced by hand, but rather falls out as a consequence of an independently motivated account of good guesses. The mechanism by which the threshold arises is illustrated in Fig. 8: when the probabilities of b and c are sufficiently far apart, a v b overtakes a v b v c before a does. But as the probabilities of b and c get closer and closer to each other, a gets closer and closer to overtaking a v b v c before a v b does. And eventually, when the probabilities of b and c get sufficiently close, a does indeed overtake a v b v c before a v b does, with the result that Optionality is violated.

Thus, there is a sense in which the threshold is not arbitrary at all: it is fully explicable by how the proposed account works. Of course, this won’t bring peace of mind to those who were hoping for a purely pre-theoretic explanation of why the threshold in question exists. But I doubt that pre-theoretic considerations will take us very far in deciding whether a threshold of this nature is acceptable. Either way, I take comfort in the fact that the threshold is not a result of gerrymandering, but a consequence of an independently motivated account of good guesses.

Fig. 8
figure 8

Expected answer-value of a, a v b, and a v b v c, depending on J, for different values of x

In sum, although I certainly haven’t argued decisively against Optionality here, I am inclined to think that we should treat the putative counterexamples to Optionality as an interesting, and perhaps somewhat surprising, consequence of—an account of good guesses whose main attractions are that it gives us a simple and elegant way to satisfy Neutrality and IIA for Guessing.

5 Conclusion

There is much to like about Dorst and Mandelkern’s account of good guesses. It is based on a simple and attractive idea, that good guesses optimize a tradeoff between accuracy and specificity, and it offers a precise implementation of this idea which lends itself to systematic investigation. Nonetheless, I have argued that their implementation fails to satisfy some plausible constraints on good guesses, and I have offered an alternative implementation that satisfies these constraints. Although the positive proposal of the paper differs from Dorst and Mandelkern’s proposal in some important respects, it retains the basic idea that good guesses optimize a tradeoff between accuracy and specificity. Thus, I would like to think that the considerations put forth in this paper do not undermine Dorst and Mandelkern’s overall project, but in fact vindicate it.