Skip to main content
Log in

Online Aggregation of Probabilistic Forecasts Based on the Continuous Ranked Probability Score

  • MATHEMATICAL MODELS AND COMPUTATIONAL METHODS
  • Published:
Journal of Communications Technology and Electronics Aims and scope Submit manuscript

Abstract—Methods for generating predictions online and in the form of probability distributions of future outcomes are considered. The difference between the probabilistic forecast (probability distribution) and the numerical outcome is measured using the loss function (scoring rule). In practical statistics, the continuous ranked probability score (CRPS) is often used to estimate the discrepancy between probabilistic forecasts and (quantitative) outcomes. The paper considers the case when several competing methods (experts) give their online predictions as distribution functions. An algorithm is proposed for online aggregation of these distribution functions. The performance bounds of the proposed algorithm are obtained in the form of a comparison of the cumulative loss of the algorithm and the loss of expert hypotheses. Unlike existing estimates, the proposed estimates do not depend on time. The results of numerical experiments illustrating the proposed methods are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 16.
Fig. 17.
Fig. 18.

Similar content being viewed by others

Notes

  1. Here 1uy = 1 if uy, otherwise it is 0.

  2. Exact definitions can be found in Section 2.

  3. Rules (5) or (6) can be employed (see below).

  4. The distribution function is a nondecreasing function F defined on interval [a, b] so that F(a) = 0 and F(b) = 1.

  5. Rule (6) can be similarly employed.

  6. Regret boundary O(ln(TN)) can be obtained for the corresponding algorithm using variable parameter α.

  7. Note that the task of the aggregation algorithm is the fastest adaptation to changes and an increase in the weight of the leading model.

REFERENCES

  1. A. Jordan, F. Krüger, and S. Lerch, Evaluating Probabilistic Forecasts with Scoring Rules arXiv:1709.04743.

  2. V. Vovk, J. Shen, V. Manokhin, and Xie. Min-ge, “Nonparametric predictive distributions based on conformal prediction,” Machine Learning 60, 82−102 (2017).

    Google Scholar 

  3. I. J. Good, “Rational decisions,” J. R. Statist. Soc. B 60, 82−102 (2052). www.jstor.org/stable/2984087.

  4. N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games (Cambridge Univ. Press, Cambridge, 2006).

    Book  Google Scholar 

  5. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci. 55, 119−139 (1997).

    Article  MathSciNet  Google Scholar 

  6. V. Vovk, “Aggregating strategies,” in Proc. 3rd Ann. Workshop on Computational Learning Theory, San Mateo, CA,1990, Ed. by M. Fulk and J. Case (Morgan Kaufmann, 1990), pp. 371−383.

  7. N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Inf. Comput. 108, 212−261 (1994).

    Article  MathSciNet  Google Scholar 

  8. V. Vovk, “A game of prediction with expert advice,” J. Comput. Syst. Sci. 56 (2), 153−173 (1998).

    Article  MathSciNet  Google Scholar 

  9. G. W. Brier, “Verification of forecasts expressed in terms of probabilities,” Mon. Weather Rev. 78, 1−3 (1950).

    Article  Google Scholar 

  10. J. Brocker and L. A. Smith, “Scoring probabilistic forecasts: The importance of being proper,” Weather & Forecasting 22, 382−388 (2007).

    Article  Google Scholar 

  11. J. Brocker and L. A. Smith, “From ensemble forecasts to predictive distribution functions,” Tellus A 60, 663−678 (2008).

    Article  Google Scholar 

  12. J. Brocker, “Evaluating raw ensembles with the continuous ranked probability score,” Q. J. R. Meteorol. Soc. B 138, 1611−1617 (2012).

    Article  Google Scholar 

  13. A. E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski, “Using Bayesian model averaging to calibrate forecast ensembles,” Mon. Weather Rev. 133, 1155−1174 (2005).

    Article  Google Scholar 

  14. K. Bogner, K. Liechti, and M. Zappa, “Technical note: Combining quantile forecasts and predictive distributions of streamflows,” Hydrol. Earth Syst. Sci. 21, 5493−5502 (2017).

    Article  Google Scholar 

  15. J. Thorey, V. Mallet, and P. Baudin, “Online learning with the continuous ranked probability score for ensemble forecasting,” Quarterly J. Royal Meteorolog. Soc. A 143, 521−529 (2017). https://doi.org/10.1002/qj.2940

    Article  Google Scholar 

  16. V. Vovk, “Competitive on-line statistics,” Int. Statist. Rev. 69, 213−248 (2001).

    Article  Google Scholar 

  17. D. Adamskiy, T. Bellotti, R. Dzhamtyrova, and Y. Kalnishkan, Aggregating Algorithm for Prediction of Packs, Machine Learning 108 (8-9): 1231–1260, 2019. arXiv:1710. 08114 [cs.LG], 2017.

  18. J. E. Matheson and R. L. Winkler, “Scoring rules for continuous probability distributions,” Management Sci. 22, 1087−1096 (1976).

    Article  Google Scholar 

  19. M. Herbster and M. Warmuth, “Tracking the best expert,” Machine Learn. 32 (2), 151−178 (1998).

    Article  Google Scholar 

Download references

Funding

This work was supported by the Russian Science Foundation, project no. 20-01-00203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. V. V’yugin.

Additional information

Translated by A. Chikishev

SUBSTITUTION FUNCTION

SUBSTITUTION FUNCTION

For arbitrary loss function \(\lambda (\gamma ,\omega )\) (γ ∈ [0, 1] and ω ∈ {0, 1}), we consider a parametric curve on the plane

$$\left( {{{e}^{{ - \eta \lambda (\gamma ,0)}}},{{e}^{{ - \eta \lambda (\gamma ,1)}}}} \right).$$
(A1)

We consider the scenario in which the curve is concave. The concavity condition is written as

$$x{\kern 1pt} '(\gamma )y{\kern 1pt} ^{"}(\gamma ) - x{\kern 1pt} ^{"}(\gamma )y{\kern 1pt} '(\gamma ) \geqslant 0,$$
(A2)

for all γ, where x(γ) = \({{e}^{{ - \eta \lambda (\gamma ,0)}}}\) and y(γ) = \({{e}^{{ - \eta \lambda (\gamma ,1)}}}\).

In particular, for quadratic loss function \(\lambda (\gamma ,\omega )\) = \({{(\gamma - \omega )}^{2}}\), we have x(γ) = \({{e}^{{ - \eta {{\gamma }^{2}}}}}\) and y(γ) = \({{e}^{{ - \eta {{{\left( {\gamma - 1} \right)}}^{2}}}}}\).

After simple transformations, inequality (A2) is equivalent to inequality

$$\eta \gamma (1 - \gamma ) \leqslant \frac{1}{2}.$$

Quantity \(\gamma (1 - \gamma )\) takes on a maximum value of 1/4 at 0 ≤ γ ≤ 1, so that the concavity condition is satisfied for any γ at 0 < η ≤ 2.

The η-mixability condition means that, for any distribution w = \(({{\omega }_{1}},...,{{\omega }_{N}})\) on a set of N experts and any forecasts f = \(({{f}_{1}},...,{{f}_{N}})\), we can find γ* for which inequalities

$${{e}^{{ - \eta (\lambda (\gamma ^{ *},\omega ))}}} \geqslant \sum\limits_{i = 1}^N {{{w}_{i}}{{e}^{{ - \eta \lambda ({{f}_{i}},\omega )}}}} ,$$
(A3)

are satisfied at ω = 0, 1. Points \(\left( {{{e}^{{ - \eta \lambda ({{f}_{i}},0)}}},{{e}^{{ - \eta \lambda ({{f}_{i}},1)}}}} \right)\) (i = 1, …, N) belong to curve (A1), and their convex combination (point M) lies inside the convex region bounded by such a curve. Condition (A3) means that the abscissa and ordinate of point N = \(({{e}^{{ - \eta \lambda (\gamma ^{ *},0)}}},{{e}^{{ - \eta \lambda (\gamma ^{ *},1)}}})\) are no less than the abscissa and ordinate of point M. We search for point N. The line passing through point M marks point N = \(({{e}^{{ - \eta \lambda (\gamma ^{ *},0)}}},{{e}^{{ - \eta \lambda (\gamma ^{ *},1)}}})\) on curve (A1) (see Fig. 1). Forecast γ* is calculated from the condition

$$\frac{{{{e}^{{ - \eta \lambda (\gamma ^{ *},1)}}}}}{{{{e}^{{ - \eta \lambda (\gamma ^{ *},0)}}}}} = \frac{{\sum\limits_{i = 1}^N {{{w}_{i}}{{e}^{{ - \eta \lambda ({{f}_{i}},1)}}}} }}{{\sum\limits_{i = 1}^N {{{w}_{i}}{{e}^{{ - \eta \lambda ({{f}_{i}},0)}}}} }}.$$
(A4)

For a quadratic loss function, equality (A4) yields the following expression for γ*:

$$\gamma ^{ *} = {\text{Subst}}({\mathbf{f}},{\mathbf{w}}) = \frac{1}{2} - \frac{1}{{2\eta }}\ln \frac{{\sum\limits_{i = 1}^N {{{w}_{i}}{{e}^{{ - \eta f_{i}^{2}}}}} }}{{\sum\limits_{i = 1}^N {{{w}_{i}}{{e}^{{ - \eta {{{({{f}_{i}} - 1)}}^{2}}}}}} }}.$$

The mixability condition for the quadratic loss function makes it possible to use η = 2.

It can be easily shown that, for any ω ∈ [0, 1], function f(γ) = \({{e}^{{ - \eta {{{(\gamma - \omega )}}^{2}}}}}\) is concave with respect to γ ∈ [0, 1] for 0 < η < 1/2. In this case, quantity γ* = \(\sum\nolimits_{i = 1}^N {{{w}_{i}}{{f}_{i}}} \) satisfies inequality (A3) at any 0 < η < 1/2 in accordance with the definition of concavity.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

V’yugin, V.V., Trunov, V.G. Online Aggregation of Probabilistic Forecasts Based on the Continuous Ranked Probability Score. J. Commun. Technol. Electron. 65, 662–676 (2020). https://doi.org/10.1134/S1064226920060285

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064226920060285

Keywords:

Navigation