Abstract
Jeffrey conditionalization is a rule for updating degrees of belief in light of uncertain evidence. It is usually assumed that the partitions involved in Jeffrey conditionalization are finite and only contain positive-credence elements. But there are interesting examples, involving continuous quantities, in which this is not the case. Q1 Can Jeffrey conditionalization be generalized to accommodate continuous cases? Meanwhile, several authors, such as Kenny Easwaran and Michael Rescorla, have been interested in Kolmogorov’s theory of regular conditional distributions (rcds) as a possible framework for conditional probability which handles probability-zero events. However the theory faces a major shortcoming: it seems messy and ad hoc. Q2 Is there some axiomatic theory which would justify and constrain the use of rcds, thus serving as a possible foundation for conditional probability? These two questions appear unrelated, but they are not, and this paper answers both. We show that when one appropriately generalizes Jeffrey conditionalization as in Q1, one obtains a framework which necessitates the use of rcds. It is then a short step to develop a general theory which addresses Q2, which we call the theory of extensions. The theory is a formal model of conditioning which recovers Bayesian conditionalization, Jeffrey conditionalization, and conditionalization via rcds as special cases.
Similar content being viewed by others
Notes
Recall that given two probability measures μ on \(({\varOmega }_{1},{\mathcal{F}}_{1})\) and ν on \(({\varOmega }_{2},{\mathcal{F}}_{2})\), there is a unique product measure μ ⊗ ν on \({\mathcal{F}}_{1}\otimes {\mathcal{F}}_{2}\) satisfying μ ⊗ ν(A ∩ B) = μ(A)ν(B) for any \(A\in {\mathcal{F}}_{1}\) and \(B\in {\mathcal{F}}_{2}\).
Dubins and Heath make a stronger claim: when \({\mathcal{G}}\)is countably generated and \(({\varOmega },{\mathcal{F}}, P)\)is standard, there always exists a conditional distribution \(P_{{\mathcal{G}}}\)that is everywhere proper and finitely additive. They claim this follows from [2, Theorem 2], however the proof of the latter result assumes that measurability is weakened to only require that a conditional distribution is measurable with respect to the completion of \({\mathcal{G}}\). Since we do not see how to trade weakening measurability with weakening regularity, we re-derive Lemma 1 for this particular case.
We include the two proofs here for completeness: clearly Ω = Ω1 ∩Ω2 is a measurable rectangle. So \(\mathcal {R}\) is nonempty. Suppose F = A ∩ B, G = C ∩ D are measurable rectangles. Then F ∩ G = (A ∩ C) ∩ (B ∩ D), which is a measurable rectangle in \(\mathcal {R}\). Now for \(\mathcal {A}\): since Q(Ω, ) = Q1(Ω1,.)Q2(Ω2,.) = 1 is constant, it is trivially \({\mathcal{F}}_{2}\)-measurable. Suppose Q(A,.) is \({\mathcal{G}}\)-measurable. Since Q(., ω) is a probability measure for any ω, Q(Ac, ω) = 1 − Q(A, ω) for any ω. Hence Q(Ac,.) = 1 − Q(A,.), which is again \({\mathcal{G}}\)-measurable. Lastly, let {Ai} be a countable sub-collection of disjoint sets. Suppose Q(Ai,.) is measurable for all i. Then \(Q(\bigcup A_{i},.)={\sum }_{i}Q(A_{i},.)\), which is again \({\mathcal{G}}\)-measurable.
Proof. Clearly \({\int \limits }_{B}Q({\varOmega },\omega )dP={\int \limits }_{B}dP=P(B\cap {\varOmega })\). So \({\varOmega }\in \mathcal {C}\). Suppose \(A\in \mathcal {C}\). Let \(B\in {\mathcal{G}}\). Then \(P(A^{c}\cap B)=P(B)-P(A\cap B)={\int \limits }_{B}Q({\varOmega },\omega )-Q(A,\omega )dP={\int \limits }_{B}Q(A^{c},\omega )dP\). So \(A^{c}\in \mathcal {C}\). Lastly, given a countable collection of disjoint sets \(\{A_{i}\}\subset \mathcal {C}\). Then \({\int \limits }_{B}Q(\bigcup _{i}A_{i},\omega )dP={\int \limits }_{B}{\sum }_{i}Q(A_{i},\omega )dP={\sum }_{i}{\int \limits }_{B}Q(A_{i},\omega )dP={\sum }_{i}P(A_{i}\cap B)=P(\bigcup _{i}A_{i}\cap B)\) for any \(B\in {\mathcal{F}}_{2}\). Thus \(\bigcup _{i}A_{i}\in {\mathcal{G}}\).
References
Billingsley, P. (1979). Probability and measure. Berlin: Wiley.
Blackwell, D., & Dubins, L.E. (1975). On existence and non-existence of proper, regular, conditional distributions. The Annals of Probability, 3(5), 741–752.
Çınlar, E. (2011). Probability and stochastics Vol. 261. Berlin: Springer.
Diaconis, P., & Zabell, S.L. (1982). Updating subjective probability. Journal of the American Statistical Association, 77(380), 822–830.
Dubins, L.E. (1975). Finitely additive conditional probabilities, conglomerability and disintegrations. The Annals of Probability, 89–99.
Dubins, L.E., & Heath, D. (1983). With respect to tail sigma fields, standard measures possess measurable disintegrations. Proceedings of the American Mathematical Society, 88(3), 416–418.
Easwaran, K.K. (2008). The foundations of conditional probability. Berkeley: University of California.
Easwaran, K.K. (2015). Primitive conditional probabilities. Manuscript. https://www.dropbox.com/s/w7m57z9r3fwsjjc/condprob.pdf?raw=1.
Fitelson, B., & Hájek, A. (2017). Declarations of independence. Synthese, 194(10), 3979–3995.
Gaifman, H., & Snir, M. (1982). Probabilities over rich languages, testing and randomness. The Journal of Symbolic Logic, 47(3), 495–548.
Gyenis, Z., Hofer-Szabó, G., Rédei, M. (2017). Conditioning using conditional expectations: the Borel–Kolmogorov paradox. Synthese, 194(7), 2595–2630.
Gyenis, Z., & Rédei, M. (2017). General properties of Bayesian learning as statistical inference determined by conditional expectations. The Review of Symbolic Logic, 10(4), 719–755.
Hild, M., Jeffrey, R., Risse, M. (1999). Aumann’s “no agreement” theorem generalized. In Bicchieri, C., Jeffrey, R.C., Skyrms, B. (Eds.) The logic of strategy (pp. 92–100): Oxford University Press.
Huttegger, S.M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(4), 611–648.
Huttegger, S.M. (2017). The probabilistic foundations of rational learning. Cambridge: Cambridge University Press.
Jeffrey, R. (1965). The logic of decision. New York: McGraw-Hill.
Jeffrey, R. (1987). Alias Smith and Jones: the testimony of the senses. Erkenntnis, 26(3), 391–399.
Kolmogorov, A.N. (1933). Foundations of the theory of probability.
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Tudia Scientiarum Mathematicarum Hungarica(2), 299–318.
Popper, K.R. (1959). The logic of scientific discovery. Evanston: Routledge.
Rényi, A. (1955). On a new axiomatic theory of probability. Acta Mathematica Hungarica, 6(3-4), 285–335.
Rescorla, M. (2015). Some epistemological ramifications of the Borel–Kolmogorov paradox. Synthese, 192(3), 735–767.
Rescorla, M. (2018). A Dutch book theorem and converse Dutch book theorem for kolmogorov conditionalization. The Review of Symbolic Logic, 1–31.
Seidenfeld, T., Schervish, M.J., Kadane, J.B. (2001). Improper regular conditional distributions. Annals of Probability, 1612–1624.
Skyrms, B. (1997). The structure of radical probabilism. In Probability, dynamics and causality (pp. 145–157): Springer.
Weisberg, J. (2011). Varieties of Bayesianism. In Handbook of the history of logic, (Vol. 10 pp. 477–551): Elsevier.
Acknowledgements
We would like to thank Adam Elga, Hans Halvorson, Jim Joyce, and Harvey Lederman for their extensive guidance and feedback. We would also like to thank Stephen Mackereth, Michael Rescorla, Teddy Seidenfeld, and two anonymous referees for their helpful comments on earlier drafts, as well as Kenny Easwaran, Zalán Gyenis, and Miklós Redéi for illuminating conversations on the topic which inspired many of these ideas. Finally, for helpful discussions, we thank Thomas Barrett, Kevin Blackwell, Laura Ruetsche, the audience at the Rutgers Foundations of Probability Seminar, April 2018, and the audience at the 20th Annual Pitt-CMU Graduate Student Philosophy Conference, March 2019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Existence Results
1.1 A.1 Example 1 (Lake)
In Section 2.5, we claimed that a Jeffrey conditionalization on σY exists in this example. To prove this claim, we have to be more precise about the background probability space \(({\varOmega },{\mathcal{F}}, P)\), which we left undefined in our discussion.
Recall that there are three quantities in the example: X denotes the distance from your boat to a particular buoy; Y denotes the distance from your boat to the pier; and Z denotes the distance between the buoy and the pier. Since Y, Z are assumed to be probabilistically independent according to P, and X is a function of Y and Z (so σX ⊂ σ{Y, Z}), a natural modeling choice for \(({\varOmega },{\mathcal{F}},P)\) is the product space \(([10,30]\times [80,120], {\mathcal{F}}_{Z}\otimes {\mathcal{F}}_{Y}, P)\) where \({\mathcal{F}}_{Y}\) and \({\mathcal{F}}_{Z}\) are the Borel algebras on [80, 120] and [10, 30] respectively, and P is the product measure P ∘ Z− 1 ⊗ P ∘ Y− 1 (recall that both P ∘ Y− 1 and P ∘ Z− 1 are uniform on their respective images). Note that, in this formal model, Y, Z are the projection maps that map ω to its respective coordinates.
With this set-up in mind, we now state the result.
Proposition 1
In Example 1, there exists a Jeffrey conditionalization of P given σY.
To prove this proposition we use the following theorem. As before, proofs are in Appendix C.
Definition 1
Given a probability space \(({\varOmega },{\mathcal{F}},P)\) and two sub-sigma-algebras \({\mathcal{G}}_{1},{\mathcal{G}}_{2}\subset {\mathcal{F}}\), we say a map \(Q: {\mathcal{G}}_{2}\times {\varOmega }\rightarrow [0,1]\) is a regular conditional distribution of \(P|_{{\mathcal{G}}_{2}}\)given \({\mathcal{G}}_{1}\) if it is the restriction of a regular conditional distribution of \(P_{{\mathcal{G}}_{1}}\) to \({\mathcal{G}}_{2}\times {\varOmega }\).
Theorem 1
Let \(({\varOmega },{\mathcal{F}},P)\)be a product measure space where Ω = Ω1 ×Ω2, \({\mathcal{F}}={\mathcal{F}}_{1}\otimes {\mathcal{F}}_{2}\) (the smallest sigma-algebra generated by measurable rectangles of the form A ∩ B, where \(A\in {\mathcal{F}}_{1}\)and \(B\in {\mathcal{F}}_{2}\)). Let Q1, Q2 be conditional distribution of \({\mathcal{F}}_{1}, {\mathcal{F}}_{2}\)given \({\mathcal{G}}\)respectively. Define \(Q:{\mathcal{F}}\times {\varOmega }\rightarrow \mathbb {R}\)such that Q(., ω) is the unique product measure Q1(., ω) ⊗ Q2(., ω). Footnote 1Then Q is regular and \({\mathcal{G}}\)-measurable. In particular, if \({\mathcal{G}}={\mathcal{F}}_{2}\), then Q2(., ω) = δω and Q is a prcd.
1.2 A.2 Example 6 (Coin Machine)
In this example, we set Ω = {H, T}× [0, 1]. \({\mathcal{F}}\) is the field generated by sets of the form {H}× [0, a] and {T}× [0, b], where a, b ∈ [0, 1]. (Equivalently, \({\mathcal{F}}\) is the product measure \({\mathcal{F}}_{1}\otimes {\mathcal{F}}_{2}\) where \({\mathcal{F}}_{1}\) is the four-element algebra {∅,{H},{T},{H, T}} and \({\mathcal{F}}_{2}\) is the Borel algebra on [0, 1].)
The prior probability assignment \(P:{\mathcal{F}}\rightarrow [0,1]\) is the unique probability distribution satisfying \(P\{(H,a): a\leq b\}=\frac {b^{2}}{2}\). (Let \(Y:{\varOmega }\rightarrow [0,1]\) be the projection map onto the second coordinate. Then note that P ∘ Y− 1 is the uniform distribution on [0, 1], as desired.)
With this set-up in mind, we state the existence result.
Proposition 2
In Example 6, there exists a complete uniform extension \(E^{P}:{\varDelta }(\sigma Y)\rightarrow {\varDelta }({\mathcal{F}})\)satisfying \([E^{P}({\delta _{b}^{Y}})](H)=b\)for all b ∈ [0, 1]. (Recall that \({\delta _{b}^{Y}}\)is the Dirac measure concentrated on the point Y = b. Intuitively \([E^{P}({\delta _{b}^{Y}})]\)is the conditional probability P(H|Y = b).)
Appendix B: Conditional Densities
Definition 2
Let \(Y: {\varOmega } \rightarrow \mathbb {R}^{n}\) be an absolutely continuous random variable. A probability density function (pdf) for Y is a map \(f_{Y}: \mathbb {R}^{n} \rightarrow \mathbb {R}\) which satisfies \({\int \limits }_{B} f_{Y}(y) dy = P(Y \in B)\) for every Borel set \(B \subset \mathbb {R}^{n}\). A joint pdf for \({\mathcal{F}}\) and Y is a map \(f_{Y}: {\mathcal{F}} \times \mathbb {R}^{n} \rightarrow \mathbb {R}\) which satisfies \({\int \limits }_{B} f_{Y}(F,y) dy = P(F \cap (Y \in B))\) for all \(F \in {\mathcal{F}}\) and every Borel set \(B \subset \mathbb {R}^{n}\).
Theorem 2 (Conditional Densities)
Let \(Y: {\varOmega } \rightarrow \mathbb {R}^{n}\)be an absolutely continuous random variable. Let fY be a pdf for Y, let S denote the set of values supported by fY(.), and let qY denote the measure q(Y ∈ .). Set \({\mathcal{G}} = \sigma Y\). Any Jeffrey conditionalization (or, equivalently, any uniform extension—see Section ??), \(E^{P}: {\varTheta }({\mathcal{G}}) \rightarrow {\varDelta }({\mathcal{F}})\), can be written in the form
for all \(q \in {\varTheta }({\mathcal{G}})\), where P(Y ∈ S) = 1, fY is a joint pdf for \({\mathcal{F}}\)and Y, and \(g: {\mathcal{F}} \times \mathbb {R}^{n} \rightarrow \mathbb {R}\)is some function. Furthermore, fix \(F \in {\mathcal{F}}\). Then
for P-almost every value \(b \in \mathbb {R}^{n}\).
Appendix C: Proofs
1.1 C.1 Propositions
Proof
(Proposition 1) It suffices to show that, for all ω ∈ Gi,
We will show it for Q; the reasoning for P is the same. Since the derivative \(\frac {dQ(F \cap .)}{dq(.)}\) is \({\mathcal{G}}\)-measurable and \({\mathcal{G}}\) is generated by π, it must be of the form \({\sum }_{j} c_{j} \cdot 1_{G_{j}}\). By the integral condition, \(Q(F \cap G_{i})= {\int \limits }_{G_{i}} {\sum }_{j} c_{j} 1_{G_{j}}(\omega ) dq(\omega ) = {\sum }_{j} c_{j} ({\int \limits }_{G_{i}} 1_{G_{j}}(\omega ) dq(\omega )) = c_{i}\cdot q(G_{i})\), so \(c_{i} = \frac {Q(F \cap G_{i})}{q(G_{i})}\) as desired. □
Proof
(Proposition 2) (⇒) Immediate from the definition of a Radon-Nikodym derivative. For (⇐) it suffices to show
We use the fact \(Q: {\mathcal{F}} \rightarrow [0,1]\) is a probability measure and \(Q=q|_{{\mathcal{G}}}\). In particular we borrow reasoning from the proof of Theorem 2, below. First use the reasoning in (⇒) to show \(\frac {dP(\cdot \cap .)}{dp(.)}(\omega ):{\mathcal{F}} \rightarrow [0,1]\) is a probability measure and \(\frac {dP(G \cap .)}{dp(.)}(\omega )=1_{G}(\omega )\) for all \(G \in {\mathcal{G}}\) and ω ∈Ω. Then use this, together with the reasoning in (⇐), to show \({\int \limits }_{G} \frac {dP(F\cap .)}{dp(.)} dq = {\int \limits } \frac {dP(F \cap G\cap .)}{dp(.)} dq\) which equals Q(F ∩ G) by assumption. □
Proof
(Proposition 3) Fix \(F \in {\mathcal{F}}\) and \(q \in {\varTheta }({\mathcal{G}})\). Note that q(.) = 0 ⇒ [EP(q)](F ∩ .) = 0 by faithfulness and the fact EP(q) is a probability measure. Thus the Radon-Nikodym theorem implies there is a density \(K^{P}(F,q): {\varOmega } \rightarrow \mathbb {R}\) which satisfies both equations in the proposition. We then let KP be the map from pairs (F, q) to the associated densities. □
Proof
(Proposition 4) Supposing EP is uniform, we obtain the result using the same reasoning as in the proof of Proposition 1 combined with the result of Proposition 5 below. And for the converse note that this EP is obviously uniform. □
Proof
(Proposition 5) (⇒) Let KP(F, q) be the density \(\frac {dP(F \cap .)}{dp(.)}\) satisfying rigidity. It does not depend on q at all, and conservativeness follows from the generalized law of total probability. (⇐) Let \(K^{P}: {\mathcal{F}} \rightarrow {\mathcal{L}}^{1}({\mathcal{G}})\) be the uniform schema (without loss of generality, we omit the second argument place). We claim K(F) is a density satisfying rigidity. It is \({\mathcal{G}}\)-measurable. By conservativeness and the definition of a schema (see Proposition 3) we have \(P(F \cap G)=[E^{P}(p)](F \cap G) = {\int \limits }_{G} K^{P}(F) dp\) which is the law of total probability. Thus it is a density. To see it satisfies rigidity note that also,
and so, for all \(q \in {\varTheta }({\mathcal{G}})\),
almost surely with respect to q, as desired. □
Proof
(Proposition 6) The proof is a slight variant of [24, Lemma 2 and Theorem 3]. It suffices to show that, for any almost regular conditional distribution \(P_{{\mathcal{G}}}\) on \({\mathcal{G}}\),
This suffices because, fixing such an ω and letting Gω be the atom containing ω (see footnote ??), it follows that PG(Gω, ω) = P(Gω) = 0 ≠ 1 (here we use the fact that Gω has countably many elements, and hence is P-zero). Since \(G_{\omega } \in {\mathcal{G}}\) (since this \({\mathcal{G}}\) is atomic; see footnote ??), this is a violation of propriety at ω. To show this conclusion holds, first we note, by the Kolmogorov zero–one law, that P(G) equals 0 or 1 for every \(G \in {\mathcal{G}}\). Next we note that \({\mathcal{F}}\) is countably generated. For each generator Fi, we define
Since \(P_{{\mathcal{G}}}(G_{i},\cdot )\) is \({\mathcal{G}}\)-measurable and P(Gi) is constant, these three sets belong to \({\mathcal{G}}\) and hence have probability either 0 or 1. Now suppose toward contradiction that \(H_{i}^{+}\) has probability 1. By the generalized law of total probability, we then obtain
which is a contradiction. The same reasoning applies to \(H^{-}_{i}\). It then follows that Hi has probability one, for any i. Now let S denote the probability-one set on which \(P_{{\mathcal{G}}}\) is regular, and define D ≡ (∩iHi) ∩ S. First note that P(D) = 1. Next note that, by definition:
Since the Fi generate \({\mathcal{F}}\), if two probability measures agree on all the Fi then they must agree on all of \({\mathcal{F}}\). And so \( \omega \in D \Longrightarrow P_{{\mathcal{G}}}(\cdot ,\omega )=P(\cdot )\). Since P(D) = 1, this is enough to establish the conclusion. □
Proof
(Proposition 7) We will show that, in Example 7, there exists a proper conditional distribution \(P_{{\mathcal{G}}}\) such that \(P_{{\mathcal{G}}}(.,\omega )\) is a finitely additive probability distribution for any ω ∈Ω. Then, by the same reasoning as the proof of Theorem 2 (direction ⇐), it follows that a complete uniform extension* on \({\mathcal{G}}\) exists—note, crucially, that the demonstration of rigidity relies only on finite additivity, not on countable additivity. First, in Example 7 define \(X_{n}:{\varOmega }\rightarrow \{0,1\}\) that projects each ω to its n-th coordinate. Intuitively, Xn records the outcome of n-th toss. Let \({\mathcal{F}}_{n}=\sigma \{X_{1},\dots ,X_{n-1}\}\) and \({\mathcal{G}}_{n}=\sigma \{X_{m}: m\geq n\}\). Then \({\mathcal{G}}=\bigcap _{n}{\mathcal{G}}_{n}\) and \({\mathcal{F}}={\mathcal{F}}_{n}\vee {\mathcal{G}}_{n}\). (Equivalently, \({\mathcal{F}}={\mathcal{F}}_{n}\otimes {\mathcal{G}}_{n}\) where \({\mathcal{F}}_{n}\) and \({\mathcal{G}}_{n}\) are viewed as sigma-algebras on \({\Omega }_{n}={0,1}^{n}\) and Ω respectively. Note that Ω = Ωn ×Ω for all n. In what follows we use these two notations interchangeably.) By Lemma 1 below, there exists a conditional distribution of P on \({\mathcal{G}}_{n}\) for each n. Next, a definition: given a measure space \(({\varOmega },{\mathcal{F}})\) and \({\mathcal{G}}\subset {\mathcal{F}}\), we say \({\mathcal{G}}\) is tame if there exists a proper conditional distribution of P given \({\mathcal{G}}\) which is finitely additive (for any ω ∈Ω) for any countably additive probability distribution P on \(({\varOmega },{\mathcal{F}})\). By Lemma 2 below (a result due to Dubins and Heath), it then follows from the above statement that there exists a proper conditional distribution of P given the tail-sigma-field \({\mathcal{G}}\) which is finitely additive (for any ω ∈Ω) as claimed. □
Lemma 1
For each n, there is a conditional distribution \(P_{{\mathcal{G}}_{n}}\)of P on \({\mathcal{G}}_{n}\)such that \(P_{{\mathcal{G}}_{n}}(\cdot , \omega )\)is finitely additive for all ω ∈Ω.Footnote 2
Lemma 2 (Dubins and Heath [6, Proposition 1])
The intersection of a decreasing sequence of sigma fields, each of which is tame, is tame.
Proof
(Lemma 1) Let \(P_{n}=P|_{{\mathcal{F}}_{n}}\). First, we claim that, for each n, there exists a regular conditional distribution of Pn given \({\mathcal{G}}_{n}\). Consider \(Q_{n}: {\mathcal{F}}_{n}\times {\varOmega }\rightarrow [0,1]\) such that Qn(F, ω) = P(F) for any ω ∈Ω. We verify that Qn so defined is an additive conditional distribution of \(X_{1},\dots ,X_{n-1}\) given {Xm : m ≥ n}. Both measurability and additivity are trivial. So it remains to verify that Qn satisfies the generalized law of total probability in the sense that \(P(F\cap G)={\int \limits }_{B}Q_{n}(F,\omega )dP\) for any \(F\in {\mathcal{F}}_{n}\) and \(G\in {\mathcal{G}}_{n}\). Let \(F\in {\mathcal{F}}_{n}\) and \(G\in {\mathcal{G}}_{n}\). Then \({\int \limits }_{B}Q_{n}(F,\omega )dP=P(F)P(G)=P(F\cap G)\), where the last equality follows from the fact that \({\mathcal{F}}_{n}\) is independent of \({\mathcal{G}}_{n}\). Now define \(P_{{\mathcal{G}}_{n}}:{\mathcal{F}}\times {\varOmega }\rightarrow [0,1]\) as the product measure \(P_{{\mathcal{G}}_{n}}(.,\omega )=Q_{n}(.,\omega )\otimes \delta _{\omega }(.)\), where we recall that \({\mathcal{F}}={\mathcal{F}}_{n}\otimes {\mathcal{G}}_{n}\). By Theorem 6 (Appendix A), \(P_{{\mathcal{G}}_{n}}\) is a prcd of P on \({\mathcal{G}}_{n}\). □
Proof
(Proposition 8) By Corollary 1 and 2, such a Jeffrey conditionalization exists just in case there is a proper, regular conditional distribution \(P_{{\mathcal{G}}}\), where \({\mathcal{G}}=\sigma Y\). First, we claim there is a regular conditional distribution of Z given Y. Consider \(Q_{Z}: \sigma Z\times {\varOmega }\rightarrow [0,1]\) given by
It is straightforward to verify QZ is a regular conditional distribution: Regularity and measurability are trivial; since Y, Z are independent, for any \(A\in {\mathcal{F}}_{Z}\) and \(B\in {\mathcal{F}}_{Y}\), \(P(A\cap B)=P(A)P(B)={\int \limits }_{B}Q_{Z}(A,\omega )dP\). So QZ also satisfies the generalized law of total probability. Now, let \({\mathcal{F}}_{1}={\mathcal{F}}_{Z}\), \({\mathcal{F}}_{2}={\mathcal{F}}_{Y}\), Q1 = QZ and Q2 be the trivial conditional distribution that maps each (A, ω) to δω(A). By Theorem 6 (Appendix A), \(P_{{\mathcal{G}}}\) defined as \(P_{{\mathcal{G}}}(.,\omega )=Q_{1}(.,\omega )\otimes Q_{2}(.,\omega )\) is an everywhere prcd of P given σY. □
Proof
(Proposition 9) Let \(X:{\varOmega }\rightarrow \{H,T\}\) be the other projection map and PX = P|σX. As above, the strategy is to construct a regular conditional distribution of PX given σY and extend it to a prcd of P on σY. Define \(Q_{X}:\sigma X\times {\varOmega }\rightarrow [0,1]\) as
We check that QX is a conditional distribution of PX given σY. First, we observe that QX({H, T}, ω) = 1, QX(∅, ω) = 0, QX({H}, ω) = Y (ω) and QX({T}, ω) = 1 − Y (ω) for any ω ∈Ω. We verify that QX so defined is in fact a conditional distribution.
-
(measurability) By a well-known result in measure theory (e.g. [3, Theorem II.4.4]), a real-valued function \(f:{\varOmega }\rightarrow \mathbb {R}\) is measurable with respect to \({\mathcal{G}}=\sigma Y\) iff there is a real-valued function g on \(\mathbb {R}\) such that f = g ∘ Y. So it suffices to show that for any A ∈ σX, the map Q(A,.) is a function of Y. Consider \(f_{n}(\omega )=\frac {P((X\in A)\cap |Y-Y(\omega )|\leq 1/n)}{P(|Y-Y(\omega )|\leq 1/n)}\) and \(g_{n}(x)=\frac {P((X\in A)\cap |Y-x|\leq 1/n)}{P(|Y-x|\leq 1/n)}\). Then fn(ω) = gn ∘ Y (ω) and \(Q_{X}(A,.)=\lim _{n}f_{n}\). Since each fn is \({\mathcal{G}}\)-measurable and measurability is closed under limit, it follows that QX(A,.) is \({\mathcal{G}}\)-measurable.
-
(regularity) This follows trivially from the above observation about values of QX(., ω).
-
(generalized law of total probability) Let A ∈ σX and B ∈ σY. Clearly if A = {H, T}, then
$${\int}_{B}Q_{X}(A,\omega)dP=P(B)=P((X\in A)\cap B).$$Similarly, if A = ∅, then \({\int \limits }_{B}Q_{X}(A,\omega )dP=0=P((X\in A)\cap B)\). So it remains to check the cases of A = {H} and A = {T}. Let A = {H} and μ = PY− 1 be the uniform distribution on [0, 1]. Then
$$ {\int}_{B} Q_{X}(H,\omega)dP={\int}_{B} Y(\omega)dP={\int}_{Y(B)} t \mu(dt)=P(H\cap B). $$(3)The case of X = {T} follows analogously.
Now note that \({\mathcal{F}}=\sigma X\otimes \sigma Y\). Let Q1 = QX and \(Q_{2}:\sigma Y\times {\varOmega }\rightarrow [0,1]\) defined as Q2(., ω) = δω. By Theorem 6 in Appendix A, \(P_{{\mathcal{G}}}(.,\omega )=Q_{1}(.,\omega )\otimes Q_{2}(.,\omega )\) is a prcd of P given σY. By Corollary 2, if \(P_{{\mathcal{G}}}\) is a prcd of P given σY, then
defines a complete uniform extension of σY (given P). Thus there is a complete uniform extension in Example 6. Moreover, note that \(P_{{\mathcal{G}}}(H,\omega )=Q_{1}(H,\omega )=Y(\omega )\). So
as desired. □
1.2 C.2 Theorems
Proof
(Theorem 2) (⇐) We first show that the assignment sends \(q \in {\varDelta }({\mathcal{G}})\) to Q which are probability measures, i.e. belong to \({\varDelta }({\mathcal{F}})\). This follows essentially from the fact that \(P_{{\mathcal{G}}}\) is regular. More specifically:
-
(non-negativity) \(\forall F \in {\mathcal{F}}\), \(Q(F)= {\int \limits }_{{\varOmega }} P_{{\mathcal{G}}}(F, \omega ) dq(\omega ) \geq 0\) since \(P_{{\mathcal{G}}}(F,\omega ) \geq 0\) for all \(F \in {\mathcal{F}}\);
-
(normalizability) \(Q({\varOmega }) = {\int \limits }_{{\varOmega }} P_{{\mathcal{G}}}({\varOmega }, \omega ) dq(\omega ) = {\int \limits }_{{\varOmega }} 1 dq(\omega ) = q({\varOmega }) = 1\);
-
(countable additivity) Let {Fi}i∈I be a countable collection of disjoint sets in \({\mathcal{F}}\) (I may be finite). Then,
$$ \begin{array}{@{}rcl@{}} Q(\cup_{i} F_{i})&=&{\int}_{{\varOmega}} P_{\mathcal{G}}(\cup_{i} F_{i}, \omega) dq(\omega) = {\int}_{{\varOmega}} \sum\limits_{i} P_{\mathcal{G}}(F_{i},\omega) dq(\omega) \\ &=& \sum\limits_{i} {\int}_{{\varOmega}} P_{\mathcal{G}}(F_{i},\omega) dq(\omega)= \sum\limits_{i} Q(F_{i}), \end{array} $$(6)where we use the Monotone Convergence Theorem.
Next we show \(Q|_{{\mathcal{G}}} = q\). This follows from propriety; for all \(G \in {\mathcal{G}}\),
Finally, for rigidity, fix \(F \in {\mathcal{F}}\) and note that \(P_{{\mathcal{G}}}(F, \cdot ): {\varOmega } \rightarrow \mathbb {R}\) is a density \(\frac {dP(F \cap .)}{dp(.)} : {\varOmega } \rightarrow \mathbb {R}\) since the generalized law of total probability is precisely the integral condition for a Radon-Nikodym derivative, where μ = p and ν = P(F ∩ .). So it suffices to show,
for all \(G \in {\mathcal{G}}\). By regularity \(P_{{\mathcal{G}}}(F, \omega )=P_{{\mathcal{G}}}(F \cap G, \omega )+P_{{\mathcal{G}}}(F \cap G^{c}, \omega )\) and so it suffices to show
By regularity and propriety, \(P_{{\mathcal{G}}}(F \cap G, \omega ) \leq P_{{\mathcal{G}}}(G,\omega ) =1_{G}(\omega )\) and \(P_{{\mathcal{G}}}(F \cap G^{c}, \omega ) \leq P_{{\mathcal{G}}}(G^{c},\omega ) =1_{G^{c}}(\omega )\). So if ω∉G then \(P_{{\mathcal{G}}}(F\cap G,\omega ) = 0\) and if ω ∈ G then \(P_{{\mathcal{G}}}(F \cap G^{c}, \omega )=0\). Thus the above equation is equivalent to
By assumption the right-hand side equals Q(F ∩ G) and so we are done.
(⇒) Consider the map \(P_{{\mathcal{G}}}\) defined in the statement of the theorem. By definition \(P_{{\mathcal{G}}}(F, \cdot )\) is \({\mathcal{G}}\)-measurable and satisfies the generalized law of total probability since \(\frac {dP(F \cap .)}{dp(.)}\) is \({\mathcal{G}}\)-measurable and satisfies the integral condition for a Radon-Nikodym derivative, where μ = p and ν = P(F ∩ .). It remains to show propriety and regularity.
-
(propriety) Fix \(G \in {\mathcal{G}}\). Let \(H_{G}^{-} = \{\omega \in {\varOmega }: P_{{\mathcal{G}}}(G,\omega ) < 1_{G}(\omega )\}\). Note that \(H_{G}^{-} \in {\mathcal{G}}\). Suppose toward contradiction that \(H_{G}^{-} \neq \emptyset \). Choose \(q \in {\varDelta }({\mathcal{G}})\) such that \(q(H_{G}^{-})=1\), for example a Dirac measure concentrated at a point in \(H_{G}^{-}\). By rigidity,
$$ \begin{array}{@{}rcl@{}} Q(G)&=&{\int}_{{\varOmega}} P_{\mathcal{G}}(G, \omega) dq(\omega)={\int}_{H_{G}^{-}} P_{\mathcal{G}}(G, \omega) dq(\omega)\\ &&< {\int}_{{\varOmega}} 1_{G}(\omega) dq(\omega) = q(G), \end{array} $$which violates the faithfulness requirement \(Q|_{{\mathcal{G}}} = q\), a contradiction. Similar reasoning with \(H_{G}^{+} = \{\omega \in {\varOmega }: P_{{\mathcal{G}}}(G,\omega ) > 1_{G}(\omega )\}\) implies \(P_{{\mathcal{G}}}(G,\omega )=1_{G}(\omega )\) for all ω ∈Ω, as desired.
-
(regularity) Fix a countable collection \(\mathcal {C}=\{F_{i}\}_{i \in I}\) disjoint and in \({\mathcal{F}}\). Let \(A_{\mathcal {C}}^{-} = \{\omega \in {\varOmega }: P_{{\mathcal{G}}}(\cup _{i} F_{i}, \omega ) < {\sum }_{i} P_{{\mathcal{G}}}(F_{i},\omega )\}\) which belongs to \({\mathcal{G}}\) [1, Theorem 13.4]. Using the same reasoning as above: Pick \(q \in {\varDelta }({\mathcal{G}})\) such that \(q(A_{\mathcal {C}}^{-})=1\). We have, using the Monotone Convergence Theorem,
$$ \begin{array}{@{}rcl@{}} Q(\cup_{i} F_{i}) &=& {\int}_{{\varOmega}} P_{\mathcal{G}}(\cup_{i} F_{i}, \omega) dq(\omega)={\int}_{A_{\mathcal{C}}^{-}} P_{\mathcal{G}}(\cup_{i} F_{i}, \omega) dq(\omega)\\ &&< {\int}_{{\varOmega}} \sum\limits_{i} P_{\mathcal{G}}(F_{i},\omega) dq(\omega) = \sum\limits_{i} {\int}_{{\varOmega}} P_{\mathcal{G}}(F_{i},\omega) dq(\omega) = \sum\limits_{i} Q(F_{i}), \end{array} $$which contradicts the assumption \(Q \in {\varDelta }({\mathcal{F}})\). Repeat with \(A_{\mathcal {C}}^{+}\) to show \(\{\omega \in {\varOmega }: P_{{\mathcal{G}}}(\cup _{i} F_{i}, \omega ) = {\sum }_{i} P_{{\mathcal{G}}}(F_{i},\omega )\}={\varOmega }\) as desired.
And we are done. □
Proof
(Theorem 3) By Radon-Nikodym there exists a conditional distribution \(P_{{\mathcal{G}}}: {\mathcal{F}} \times {\varOmega } \rightarrow \mathbb {R}\). Let \(E^{P}: {\varDelta }^{+}({\mathcal{G}}) \rightarrow {\varDelta }({\mathcal{F}})\) be the map defined by
We claim that EP is a uniform minimal extension. Note it is uniform and minimal by construction. It is a standard result [1, Theorem 33.2] that, fixing some countable disjoint sequence {Fi}i∈I, for P-almost every ω ∈Ω we have \(0 \leq P_{{\mathcal{G}}}(F_{i},\omega ) \leq 1\) and \(P_{{\mathcal{G}}}(\cup _{i} F_{i}, \omega ) = {\sum }_{i} P_{{\mathcal{G}}}(F_{i}, \omega )\). So,
where the first equality follows from the fact q is absolutely continuous with respect to P, and the second follows from the same reasoning as Eq. 6. Thus \(E^{P}(q) \in {\varDelta }({\mathcal{F}})\). For faithfulness we use the well-known result that, for any \(G \in {\mathcal{G}}\), its conditional expectation with respect to \({\mathcal{G}}\) is almost surely equal to its indicator, i.e. we have \(P_{{\mathcal{G}}}(G,\omega )=1_{G}(\omega )\) for P-almost every ω ∈Ω (this property is sometimes called conditional determinism [3, 144]). Since q is absolutely continuous with respect to p,
For conservativeness note,
using the generalized law of total probability.
Finally, we show uniqueness. Suppose \(\bar {E}^{P}: {\varDelta }^{+}({\mathcal{G}})) \rightarrow {\varDelta }({\mathcal{F}})\) is another uniform extension. Thanks to conservativeness the associated schema KP must be such that \(K^{P}(\cdot , q)(\cdot ): {\mathcal{F}} \times {\varOmega } \rightarrow \mathbb {R}\) is a conditional distribution of P on \({\mathcal{G}}\). KP(F, q) is unique P-almost surely and does not vary as q varies except perhaps on a P-null set. Since every \(q \in {\varDelta }^{+}({\mathcal{G}})\) is absolutely continuous with respect to P, it follows KP(F, q) can be treated as \(P_{{\mathcal{G}}}(F,\cdot )\) from above and hence \(E^{P}(q)=\bar {E}^{P}(q)\) for all \(q \in {\varDelta }^{+}({\mathcal{G}})\). □
Proof
(Theorem 4) (⇐) Let S be the probability-one set on which regularity and propriety are both satisfied. The proof proceeds much the same way as Theorem 2, except we must concentrate the integrals over S. Since S may not belong to \({\mathcal{G}}\), the adjustment is not completely trivial. To see how this is done consider non-negativity. We fix any \(F \in {\mathcal{F}}\) and any \(q \in {\varDelta }^{S}({\mathcal{G}})\). The trick is to define
which belongs to \({\mathcal{G}}\) (since \(P_{{\mathcal{G}}}(F,\cdot )\) is \({\mathcal{G}}\)-measurable) and falls inside the complement of S. Hence, by definition of \({\varDelta }^{S}({\mathcal{G}})\), q(NF) = 0 and so (since q is a probability measure) \(q({N_{F}^{c}})=1\). Now we can freely swap out the domain of integration Ω (in the integrals in the proof of Theorem 2) with the complement of NF, allowing us to show: \(Q(G) = {\int \limits }_{{\varOmega }} P_{{\mathcal{G}}}(F,\omega ) dq(\omega ) = {\int \limits }_{{N_{F}^{c}}} P_{{\mathcal{G}}}(F,\omega ) dq(\omega ) \geq 0\). For the other properties we repeat the same strategy. For normalizabilty we fix any \(q \in {\varDelta }^{S}({\mathcal{G}})\) and consider
which belongs to \({\mathcal{G}}\) and falls in the complement of S, hence \(q(I^{c}_{{\varOmega }})=1\). For countable additivity, we fix any \(q \in {\varDelta }^{S}({\mathcal{G}})\) and any countable collection \(\mathcal {C}=\{F_{i}\}_{i \in I}\) and consider
which belongs to \({\mathcal{G}}\) [1, Theorem 13.4] and falls in the complement of S, hence \(q(A^{c}_{\mathcal {C}})=1\). To show faithfulness we fix any \(q \in {\varDelta }^{S}({\mathcal{G}})\) and any \(G \in {\mathcal{G}}\) and consider
which belongs to \({\mathcal{G}}\) and falls in the complement of S, hence \(q({H^{c}_{G}})=1\). For conservativeness note that for all \(F \in {\mathcal{F}}\), \([E^{P}(q)](F) ={\int \limits }_{{\varOmega }} P_{{\mathcal{G}}}(F,\omega ) dp(\omega ) = P(F \cap {\varOmega }) = P(F)\) since \(P_{{\mathcal{G}}}\) satisfies the generalized law of total probability. Finally, it remains to show \(P_{{\mathcal{G}}}\) is a uniform schema. Uniformity is obvious, but we need to show that \(P_{{\mathcal{G}}}\) satisfies the role of a schema (recall Proposition 3), i.e., letting Q ≡ [EP(q)],
We can use the same argument as for Theorem 2, except as above we must concentrate on S. To do this we fix any \(F\in {\mathcal{F}}\) and \(G\in {\mathcal{G}}\). We then define the union of the above sets N, A, H as applied to all the logical combinations of this F and G, and (in the case of A) all the disjoint collections (e.g. {F ∩ G, F ∩ Gc}) that can be made from these combinations. This finite union belongs to \({\mathcal{G}}\), and it falls inside Sc, so q assigns 1 to its complement, the conjunction of all the proper and regular behavior. We then repeat the same reasoning as in the proof of Theorem 2, freely swapping out Ω with this set in the integration.
(⇒) Consider the map \(P_{{\mathcal{G}}}\) as defined in the statement of the theorem. By definition \(P_{{\mathcal{G}}}(F,\cdot )\) is \({\mathcal{G}}\)-measurable and satisfies the generalized law of total probability since by conservativeness and the definition of a schema (recall Proposition 3),
It remains to show almost propriety and almost regularity. We follow the same steps as in the proof of Theorem 2, except rather than showing that violations of regularity or propriety can only happen in the emptyset (i.e. can’t happen), we show they can only happen in Sc. For example, for almost propriety, we suppose toward contradiction that \(H_{G}^{-}\) is not in the complement of S, so \(S \cap H_{G}^{-}\) is non-empty. We then choose q to be a Dirac measure concentrated at a point ω∗ in \(S \cap H_{G}^{-}\), and derive a contradiction in the same way. (Crucially, note that, for such a q, if G ⊂ Sc, then ω∗∉G and so q(G) = 0 as required. Also note that \(\omega ^{*} \in H_{G}^{-}\), and so \(q(H_{G}^{-})=1\).) For almost regularity we repeat the same reasoning but with \(A_{\mathcal {C}}^{-}\). □
Proof
(Theorem 6) We check that Q satisfies the two desired properties:
-
(regularity) This is trivial since Q(., ω) is a probability measure by construction.
-
(measurability) Let \(\mathcal {R}\) denote the collection of measurable rectangles. Note that \(\mathcal {R}\) forms a π-system, i.e. it is nonempty and closed under conjunction. Let \(\mathcal {A}\) denote the collection
$$\{A\in\mathcal{F}: Q(A,.)\ \text{is}\ \mathcal{G}\text{ - measurable}\}.$$We note that \(\mathcal {A}\) is a λ-system, i.e. it contains Ω and is closed under complementation as well as countable disjoint union.Footnote 3 To prove measurability, it suffices to show that \({\mathcal{F}}\subset \mathcal {A}\). Since \({\mathcal{F}}=\sigma \mathcal {R}\), by the monotone class theorem, \({\mathcal{F}}\subset \mathcal {A}\) if \(\mathcal {R}\subset \mathcal {A}\). Let \(A\cap B\in \mathcal {R}\). Then Q(A ∩ B,.) = Q1(A,.)Q2(B,.). Since both Q1(A,.) and Q2(B,.) are \({\mathcal{F}}_{2}\)-measurable, their product is \({\mathcal{G}}\)-measurable. Thus \(\mathcal {R}\subset \mathcal {A}\).
Next, suppose \({\mathcal{G}}={\mathcal{F}}_{2}\). We show that Q is a prcd. That is, Q satisfies in addition the requirement of propriety and the generalized law of total probability.
-
(generalized law of total probability) We adopt the same strategy as above: let
$$ \mathcal{C}=\{A\in\mathcal{F}: P(C\cap B)={\int}_{B}Q(C,\omega)dP\ \text{for any }B\in\mathcal{G}\}. $$Then \(\mathcal {C}\) is a λ-system.Footnote 4 So it remains to check that \(\mathcal {R}\subset \mathcal {D}\). This is trivial: let \(A=A_{1}\cap A_{2}\in \mathcal {R}\), and \(B\in {\mathcal{G}}\), then \({\int \limits }_{B}Q(A,\omega )dP={\int \limits }_{B}Q_{1}(A_{1},\omega )Q_{2}(A_{2},\omega )dP={\int \limits }_{B\cap A_{2}}Q_{1}(A_{1},\omega )dP=P(A_{1}\cap (B\cap A_{2}) )=P((A_{1}\cap A_{2})\cap B)=P(A\cap B)\).
-
(propriety) Let \(B\in {\mathcal{F}}_{2}\). Then Q(Ω1 ∩ B, ω) = Q1(Ω1, ω)Q2(B, ω) = δω(B) = 1B(ω), as desired.
□
Proof
(Theorem 7) Let \(E^{P}: {\varTheta }({\mathcal{G}}) \rightarrow {\varDelta }({\mathcal{F}})\) be a uniform extension with \({\mathcal{G}} = \sigma Y\) and \(K^{P}: {\mathcal{F}} \rightarrow {\mathcal{L}}^{1}({\mathcal{G}})\) the associated schema (without loss of generality we will omit the second argument place). By conservativeness, KP(F) is a Radon-Nikodym derivative of P(F ∩ .) with respect to p(.). It suffices to show there is a function \(g: {\mathcal{F}} \times \mathbb {R}^{n} \rightarrow \mathbb {R}\) such that,
for all \(F \in {\mathcal{F}}\). We first observe that since KP(F) is \({\mathcal{G}}\)-measurable, if \(Y(\omega ) = Y(\omega ^{\prime })\) then \(K^{P}(F)(\omega )=K^{P}(F)(\omega ^{\prime })\). For each \(y \in \mathbb {R}^{n}\), let ωy be a representative element in Y− 1(y). Define \(f_{Y}: {\mathcal{F}} \times \mathbb {R}^{n} \rightarrow \mathbb {R}\) by
We claim that fY so defined is in fact a joint pdf for \({\mathcal{F}}\) and Y. Let \(C \subset \mathbb {R}^{n}\) Borel. Then,
as desired, where we use the fact fY(y)dy = dpY(y). We then define
It is easy to check that
as desired.
For the second claim, note by the Lebesgue differentiation theorem there is a set \(T \subset \mathbb {R}^{n}\) with full measure under pY such that for all b ∈ T, we have
and
Since the intersection of two measure-one sets is still measure 1, we have P(S ∩ Y− 1(T)) = 1 and for every \(b \in \mathbb {R}^{n}\) such that Y− 1(b) ⊂ S ∩ Y− 1(T) we have,
as claimed, where we use the fact that the denominator limit is non-zero. □
Rights and permissions
About this article
Cite this article
Meehan, A., Zhang, S. Jeffrey Meets Kolmogorov. J Philos Logic 49, 941–979 (2020). https://doi.org/10.1007/s10992-019-09543-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10992-019-09543-7