Skip to main content
Log in

Optimal probability aggregation based on generalized brier scoring

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

In this paper we combine the theory of probability aggregation with results of machine learning theory concerning the optimality of predictions under expert advice. In probability aggregation theory several characterization results for linear aggregation exist. However, in linear aggregation weights are not fixed, but free parameters. We show how fixing such weights by success-based scores, a generalization of Brier scoring, allows for transferring the mentioned optimality results to the case of probability aggregation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Arrow, K.J.: Social Choice and Individual Values, 2nd edn. Yale University Press, Yale (1963)

  2. Brier, G. W.: Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 78(1), 1–3 (1950)

    Article  Google Scholar 

  3. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  4. Dietrich, F., Endriss, U., Grossi, D., Pigozzi, G., Slavkovik, M.: JA4AI – judgment aggregation for artificial intelligence (Dagstuhl Seminar 14202). Dagstuhl Reports 4(5), 27–39 (2014). https://doi.org/10.4230/DagRep.4.5.27. http://drops.dagstuhl.de/opus/volltexte/2014/4679

    Article  Google Scholar 

  5. Feldbacher-Escamilla, C.J.: An optimality-argument for equal weighting. Synthese (2018). https://doi.org/10.1007/s11229-018-02028-1

  6. Genest, C., McConway, K.J.: Allocating the weights in the linear opinion pool. J. Forecast. 9(1), 53–73 (1990). https://doi.org/10.1002/for.3980090106

    Article  Google Scholar 

  7. Genest, C., McConway, K.J., Schervish, M.J.: Characterization of externally bayesian pooling operators. Ann. Stat. 14(2), 487–501 (1986). https://doi.org/10.1214/aos/1176349934

    Article  MathSciNet  MATH  Google Scholar 

  8. Genest, C., Zidek, J. V.: Combining probability distributions: a critique and an annotated bibliography. Stat. Sci. 1(1), 114–135 (1986)

    Article  MathSciNet  Google Scholar 

  9. Grossi, D., Pigozzi, G.: Judgment Aggregation: a Primer. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, Williston (2014)

    Google Scholar 

  10. Kornhauser, L.A., Sager, L.G.: Unpacking the court. Yale Law J. 96(1), 82–117 (1986). http://www.jstor.org/stable/796436

    Article  Google Scholar 

  11. Lehrer, K., Wagner, C.: Rational Consesus in Science and Society. A Philosophical and Mathematical Study. Reidel Publishing Company, Dordrecht (1981)

    MATH  Google Scholar 

  12. List, C., Pettit, P.: Aggregating sets of judgments: an impossibility result. Econ. Philos. 18(01), 89–110 (2002)

    Article  Google Scholar 

  13. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  14. Rossi, F., Venable, K. B., Walsh, T.: A Short Introduction to Preferences. Between Artificial Intelligence and Social Choice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, Williston (2011)

    Google Scholar 

  15. Schurz, G.: The Meta-Inductivist’s winning strategy in the prediction game: a new approach to hume’s problem. Philos. Sci. 75(3), 278–305 (2008)

    Article  MathSciNet  Google Scholar 

  16. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning. From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian J. Feldbacher-Escamilla.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Here we provide a proof of Theorem 2 which is a slight expansion of a proof provided in [5] which itself is loosely based on a proof provided in [16, p.253]: The main strategy of the proof is to apply inequalities such that the differences of the success rates are narrowly bounded. As we demonstrate now, success-based weighting allows for an optimal bound in the sense that in the limit such weighting cannot be outperformed by any other inference method in terms of the success rate.

Proof

In order to prove the no regret-property of the aggregating method Paggr, we characterise the difference between the competing predictors and that of the aggregating predictor by help of a learning parameter η which is a function of the number of rounds t, and which grows sublinearly with t. If such a characterisation succeeds, then the difference of the success rate grows sublinearly only and vanishes in the limit; this means that by help of such a characterisation the aggregating predictor is shown to be not outperformed by any other predictor in the limit. As it turns out, one can characterise such differences in successes by help of choosing \(\eta =\sqrt {\frac {2\cdot \ln (n)}{T}}\). Here T is an arbitrary round and sometimes also called the prediction horizon up to which a boundary is proven [3, p.15]. In order to generalise this boundary to any round t, one needs, in a second step, to get rid of the exact choice of T by employing the so-called doubling trick, according to which for each round t it is assumed that the prediction horizon T doubles; this assumption increases the bound a bit, but does not change anything regarding the limiting case, and hence allows for proving a general optimality result too. In the following proof we demonstrate the first part (for arbitrary T); the second part of applying the doubling trick can be recapitulated by help of [13, p.158].

  1. i.

    Recall from Sections 3 and 4 that the probabilistic aggregation method we are aiming at is defined as the weighted (wi,t) average of the individual predictions (Pi,t), where the weights are a function of the per round successes si,t and the latter are just defined as the “inverse” (within the unit interval) of the losses l(Pi,t(v),valt(v)).

  2. ii.

    Let \(\eta =\sqrt {\frac {2\cdot \ln (n)}{T}}\). Furthermore let l be convex. Let us also restate the weights \(w^{av}_{i,t}\) recursively via defining coefficients c: Let ci,1 (for 1 ≤ in) be 1. Then define recursively \(c_{i,t+1}=c_{i,t}\cdot e^{-\eta \cdot {\sum }_{m=1}^{k}l^{m}_{i,t}/k}\), where \(l^{m}_{i,t}=l(P_{i,t}(v_{m}),val(v_{m}))\) is the loss of i at round t with respect to the prediction of all value vm.

  3. iii.

    By definition of c we get the following equalities about the ratio of the denominators used in normalisation of the weights (the normalising denominator for t + 1 and that of t):

    $$ \begin{array}{@{}rcl@{}} \frac{\sum\limits_{i=1}^{n} c_{i,t+1}}{\sum\limits_{j=1}^{n} c_{j,t}}&=&\sum\limits_{i=1}^{n}\frac{c_{i,t+1}}{\sum\limits_{j=1}^{n} c_{j,t}}=\sum\limits_{i=1}^{n}\frac{c_{i,t} \cdot e^{-\eta\cdot\sum\limits_{m=1}^{k} l^{m}_{i,t}/k}}{\sum\limits_{j=1}^{n} c_{j,t}}\\ &=&\sum\limits_{i=1}^{n} w^{av}_{i,t} \cdot e^{-\eta\cdot\sum\limits_{m=1}^{k} l^{m}_{i,t}/k} \end{array} $$

    In what follows we abbreviate \(\sum \limits _{m=1}^{k} l^{m}_{i,t}/k\) simply by Σli,t.

  4. iv.

    By the inequality \(e^{-x}\leq 1-x+\frac {x^{2}}{2}\) (valid for all x ≥ 0) we get the instance:

    $$ e^{-\eta\cdot{\Sigma} l_{i,t}}~~\leq~~1-\eta\cdot{\Sigma} l_{i,t}+\frac{\eta^{2}\cdot\left( {\Sigma} l_{i,t}\right)^{2}}{2} $$

    Note that due to the assumptions in ii. 0 ≤ η < 1 and due to the boundedness of loss l by [0, 1] η ⋅Σli,t ∈ [0, 1].

  5. v.

    By substituting the right term in the inequality of iv. for the e-term in iii. we get:

    $$ \begin{array}{@{}rcl@{}} \frac{\sum\limits_{i=1}^{n} c_{i,t+1}}{\sum\limits_{j=1}^{n} c_{j,t}}&\leq&\sum\limits_{i=1}^{n} w^{av}_{i,t}\cdot \left( 1-\eta\cdot{\Sigma} l_{i,t}+\frac{\eta^{2}\cdot\left( {\Sigma} l_{i,t}\right)^{2}}{2}\right)\\ && \text{and by arithmetic transformation:}\\ &\leq& \sum\limits_{i=1}^{n} w^{av}_{i,t} - \left( \eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\frac{\eta^{2}}{2}\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot\left( {\Sigma} l_{i,t}\right)^{2}\right)\right)\\ && \text{By the normalisation of \textit{w}:~}\sum\limits_{i=1}^{n} w^{av}_{i,t}=1\text{, so:}\\ &\leq& 1 -\left( \eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\frac{\eta^{2}}{2}\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot\left( {\Sigma} l_{i,t}\right)^{2}\right)\right)\\ && \text{By taking the} \ln \text{on both sides of the inequality:}\\ \ln\left( \frac{\sum\limits_{i=1}^{n} c_{i,t+1}}{\sum\limits_{j=1}^{n} c_{j,t}}\right)&\leq& \ln\left( 1-\left( \eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\frac{\eta^{2}}{2}\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot\left( {\Sigma} l_{i,t}\right)^{2}\right)\right)\right)\\ \end{array} $$
  6. vi.

    By the inequality ex ≥ 1 − x (valid for any x) we get \(\ln (e^{-x})\geq \ln (1-x)\) and hence \(-x\geq \ln (1-x)\). So, as an instance:

    $$ \begin{array}{@{}rcl@{}} &&-\left( \eta\!\cdot\!\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right){\kern1.7pt}-{\kern1.7pt}\frac{\eta^{2}}{2}\!\cdot\!\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}^{2}\right)\right) \\ &&\geq\ln\left( 1-\left( \eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\frac{\eta^{2}}{2}\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}^{2}\right)\right)\right) \end{array} $$

    Verify that due to the assumptions in ii. 0 ≤ η < 1, the boundedness of loss l by [0, 1], as well as the normalisation of w our instance of x is within [0, 1].

  7. vii.

    By substituting the left (upper) term in the inequality of vi. for the right term in the inequality in v. we get:

    $$ \begin{array}{ll} \ln\left( \frac{\sum\limits_{i=1}^{n} c_{i,t+1}}{\sum\limits_{j=1}^{n} c_{j,t}}\right)\leq-\left( \eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\frac{\eta^{2}}{2}\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot\left( {\Sigma} l_{i,t}\right)^{2}\right)\right)\\ \text{and by arithmetic transformation:}\\ \leq\frac{\eta^{2}}{2}\cdot\underbrace{\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot\left( {\Sigma} l_{i,t}\right)^{2}\right)}_{\leq1}-\eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)\\ \text{\dots{} due to~}\sum\limits_{i=1}^{n} w^{av}_{i,t}=1\text{, and} l\in [0,1], {so:}\\ \leq\frac{\eta^{2}}{2}\cdot 1 -\eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) \end{array} $$
  8. viii.

    So, we arrived at the inequality (from vii.):

    $$ \ln\left( \sum\limits_{i=1}^{n} c_{i,t+1}\right)-\ln\left( \sum\limits_{i=1}^{n} c_{j,t}\right)~~\leq~~\frac{\eta^{2}}{2}-\eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) $$

    Now we can sum up each side of the inequality from 1 to T:

    $$ \underbrace{\sum\limits_{t=1}^{T}\left( \underbrace{\ln\left( \sum\limits_{i=1}^{n} c_{i,t+1}\right)}_{=_{def}C_{t+1}}-\underbrace{\ln\left( \sum\limits_{i=1}^{n} c_{j,t}\right)}_{=_{def}C_{t}}\right)}_{\underset{=C_{T+1}-C1}{=~(C_{T+1}-C_{T})+\cdots+(C_{3}-C_{2})+(C_{2}-C_{1})}}~~\leq~~\underbrace{\sum\limits_{t=1}^{T}\left( \frac{\eta^{2}}{2}-\eta\cdot\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)\right)}_{=\frac{T\cdot\eta^{2}}{2}-\eta\cdot\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)} $$

    So, we arrive at:

    $$ \ln\left( \sum\limits_{i=1}^{n} c_{i,T+1}\right)-\ln\underbrace{\left( \sum\limits_{i=1}^{n} c_{i,1}\right)}_{=n\ }~~\leq~~\frac{T\cdot\eta^{2}}{2}-\eta\cdot\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) $$

    Hence:

    $$ \ln\left( \sum\limits_{i=1}^{n} c_{i,T+1}\right)-\ln(n)~~\leq~~\frac{T\cdot\eta^{2}}{2}-\eta\cdot\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) $$

    Recall, ci,t is the cumulative loss up to t in the exponent and we are after the bound for the regret with respect to the best predictor, hence we concentrate on the predictor with minimal cumulative loss up to T: Let us denote this predictor with b (\(b=(\iota i)({\sum }_{t=1}^{T}{\Sigma } l_{i,t}=min({\sum }_{t=1}^{T}\sum l_{1,t},\dots ,{\sum }_{t=1}^{T}\sum l_{n,t}))\)). If there are more, then we can randomly pick one. Now:

    $$ \ln(c_{b,T})~~\leq~~\ln\left( \sum\limits_{i=1}^{n} c_{i,T+1}\right) $$

    Hence:

    $$ \ln(c_{b,T})-\ln(n)~~\leq~~\frac{T\cdot\eta^{2}}{2}-\eta\cdot\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) $$
  9. ix.

    By definition of c:

    $$ c_{b,T}=\underbrace{c_{b,1}\cdot\prod\limits_{t=2}^{T}e^{-\eta\cdot{\Sigma} l_{b,t}}}_{\underset{=\exp\left( -\eta\cdot\sum\limits_{t=1}^{T}{\Sigma} l_{b,t}\right)}{=e^{-\eta\cdot({\Sigma} l_{b,1}+{\Sigma} l_{b,2}+\cdots+{\Sigma} l_{b,T})}}} $$

    So:

    $$ \ln(c_{b,T})=\ln\left( e^{-\eta\cdot\sum\limits_{t=1}^{T}{\Sigma} l_{b,t}}\right)=-\eta\cdot\sum\limits_{t=1}^{T}{\Sigma} l_{b,t} $$

    By substituting the right term in the last inequality in viii. we get:

    $$ -\eta\cdot\sum\limits_{t=1}^{T}{\Sigma} l_{b,t}-\ln(n)~~\leq~~\frac{T\cdot\eta^{2}}{2}-\eta\cdot\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right) $$

    And by arithmetical transformation:

    $$ \sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\sum\limits_{t=1}^{T}{\Sigma} l_{b,t}~~\leq~~\frac{T\cdot\eta}{2}+\frac{\ln(n)}{\eta} $$

    If we substitute for η in accordance with ii: \(\eta =\sqrt {\frac {2\cdot \ln (n)}{T}}\), we get:

    $$ \sum\limits_{t=1}^{T}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot{\Sigma} l_{i,t}\right)-\sum\limits_{t=1}^{T}{\Sigma} l_{b,t}~~\leq~~\sqrt{2\cdot\ln(n)\cdot T} $$

    Now, what is left is to employ the left term of the difference in the inequality above for proving a bound for the meta-inductive method’s regret.

  10. x.

    According to (AGGR), Paggr predicts as follows: \(P_{aggr,t}(v_{m})=\sum \limits _{i=1}^{n} w^{av}_{i,t}\cdot P_{i,t}(v_{m})\). Hence its loss for value m is: \(l\left (\sum \limits _{i=1}^{n}(w^{av}_{i,t}\cdot P_{i,t}(v_{m})),val_{t}(v_{m})\right )\). And hence its average cumulative loss is:

    $$\sum\limits_{t=1}^{T}\sum\limits_{m=1}^{k} l\left( \sum\limits_{i=1}^{n}(w^{av}_{i,t}\cdot P_{i,t}(v_{m})),val_{t}(v_{m})\right)/k$$

    Since l is convex (according to ii.), we get:

    $$\sum\limits_{m=1}^{k} l\left( \sum\limits_{i=1}^{n}(w^{av}_{i,t}\cdot P_{i,t}(v_{m})),val_{t}(v_{m})\right)/k~~\leq~~\sum\limits_{m=1}^{k}\sum\limits_{i=1}^{n}\left( w^{av}_{i,t}\cdot l(P_{i,t}(v_{m}),val_{t}(v_{m}))\right)/k$$

    (I.e.: The loss of a weighted average of predictions is smaller than or equal to the weighted average of the losses of the predictions.) Hence, from the last inequality in ix. and the convexity of l we get:

    $$ \begin{array}{l} \underbrace{\sum\limits_{t=1}^{T}\sum\limits_{m=1}^{k}\left( l\left( \sum\limits_{i=1}^{n}(w^{av}_{i,t}\cdot P_{i,t}(v_{m})),val_{t}(v_{m})\right)\right)\!/k-\!\!\sum\limits_{t=1}^{T}\sum\limits_{m=1}^{k} l(P_{b,t}(v_{m}),val_{t}(v_{m}))/k}_{=l^{av}_{aggr,T}\cdot T-l^{av}_{b,T}\cdot T}\\ ~~\leq~~\sqrt{2\cdot\ln(n)\cdot T} \end{array} $$
  11. xi.

    Now, since \(s^{av}_{i,T}=1 - l^{av}_{i,T}\), this means that:

    $$s^{av}_{b,T}-s^{av}_{aggr,T}\leq\frac{const}{\sqrt{T}}$$

    By applying the above mentioned doubling trick, this holds for all T, hence:

    $$\lim\limits_{t\rightarrow\infty}s^{av}_{b,t}-s^{av}_{aggr,t}\leq0$$

    Since Pb was the method with least cumulative loss up to t (we defined b this way in viii.), this bound holds also with respect to all other predictors (for all 1 ≤ in).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feldbacher-Escamilla, C.J., Schurz, G. Optimal probability aggregation based on generalized brier scoring. Ann Math Artif Intell 88, 717–734 (2020). https://doi.org/10.1007/s10472-019-09648-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-019-09648-4

Keywords

Mathematics Subject Classification (2010)

Navigation