Abstract—Methods for generating predictions online and in the form of probability distributions of future outcomes are considered. The difference between the probabilistic forecast (probability distribution) and the numerical outcome is measured using the loss function (scoring rule). In practical statistics, the continuous ranked probability score (CRPS) is often used to estimate the discrepancy between probabilistic forecasts and (quantitative) outcomes. The paper considers the case when several competing methods (experts) give their online predictions as distribution functions. An algorithm is proposed for online aggregation of these distribution functions. The performance bounds of the proposed algorithm are obtained in the form of a comparison of the cumulative loss of the algorithm and the loss of expert hypotheses. Unlike existing estimates, the proposed estimates do not depend on time. The results of numerical experiments illustrating the proposed methods are presented.
Similar content being viewed by others
Notes
Here 1u ≥ y = 1 if u ≥ y, otherwise it is 0.
Exact definitions can be found in Section 2.
Rules (5) or (6) can be employed (see below).
The distribution function is a nondecreasing function F defined on interval [a, b] so that F(a) = 0 and F(b) = 1.
Rule (6) can be similarly employed.
Regret boundary O(ln(TN)) can be obtained for the corresponding algorithm using variable parameter α.
Note that the task of the aggregation algorithm is the fastest adaptation to changes and an increase in the weight of the leading model.
REFERENCES
A. Jordan, F. Krüger, and S. Lerch, Evaluating Probabilistic Forecasts with Scoring Rules arXiv:1709.04743.
V. Vovk, J. Shen, V. Manokhin, and Xie. Min-ge, “Nonparametric predictive distributions based on conformal prediction,” Machine Learning 60, 82−102 (2017).
I. J. Good, “Rational decisions,” J. R. Statist. Soc. B 60, 82−102 (2052). www.jstor.org/stable/2984087.
N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games (Cambridge Univ. Press, Cambridge, 2006).
Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci. 55, 119−139 (1997).
V. Vovk, “Aggregating strategies,” in Proc. 3rd Ann. Workshop on Computational Learning Theory, San Mateo, CA,1990, Ed. by M. Fulk and J. Case (Morgan Kaufmann, 1990), pp. 371−383.
N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Inf. Comput. 108, 212−261 (1994).
V. Vovk, “A game of prediction with expert advice,” J. Comput. Syst. Sci. 56 (2), 153−173 (1998).
G. W. Brier, “Verification of forecasts expressed in terms of probabilities,” Mon. Weather Rev. 78, 1−3 (1950).
J. Brocker and L. A. Smith, “Scoring probabilistic forecasts: The importance of being proper,” Weather & Forecasting 22, 382−388 (2007).
J. Brocker and L. A. Smith, “From ensemble forecasts to predictive distribution functions,” Tellus A 60, 663−678 (2008).
J. Brocker, “Evaluating raw ensembles with the continuous ranked probability score,” Q. J. R. Meteorol. Soc. B 138, 1611−1617 (2012).
A. E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski, “Using Bayesian model averaging to calibrate forecast ensembles,” Mon. Weather Rev. 133, 1155−1174 (2005).
K. Bogner, K. Liechti, and M. Zappa, “Technical note: Combining quantile forecasts and predictive distributions of streamflows,” Hydrol. Earth Syst. Sci. 21, 5493−5502 (2017).
J. Thorey, V. Mallet, and P. Baudin, “Online learning with the continuous ranked probability score for ensemble forecasting,” Quarterly J. Royal Meteorolog. Soc. A 143, 521−529 (2017). https://doi.org/10.1002/qj.2940
V. Vovk, “Competitive on-line statistics,” Int. Statist. Rev. 69, 213−248 (2001).
D. Adamskiy, T. Bellotti, R. Dzhamtyrova, and Y. Kalnishkan, Aggregating Algorithm for Prediction of Packs, Machine Learning 108 (8-9): 1231–1260, 2019. arXiv:1710. 08114 [cs.LG], 2017.
J. E. Matheson and R. L. Winkler, “Scoring rules for continuous probability distributions,” Management Sci. 22, 1087−1096 (1976).
M. Herbster and M. Warmuth, “Tracking the best expert,” Machine Learn. 32 (2), 151−178 (1998).
Funding
This work was supported by the Russian Science Foundation, project no. 20-01-00203.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by A. Chikishev
SUBSTITUTION FUNCTION
SUBSTITUTION FUNCTION
For arbitrary loss function \(\lambda (\gamma ,\omega )\) (γ ∈ [0, 1] and ω ∈ {0, 1}), we consider a parametric curve on the plane
We consider the scenario in which the curve is concave. The concavity condition is written as
for all γ, where x(γ) = \({{e}^{{ - \eta \lambda (\gamma ,0)}}}\) and y(γ) = \({{e}^{{ - \eta \lambda (\gamma ,1)}}}\).
In particular, for quadratic loss function \(\lambda (\gamma ,\omega )\) = \({{(\gamma - \omega )}^{2}}\), we have x(γ) = \({{e}^{{ - \eta {{\gamma }^{2}}}}}\) and y(γ) = \({{e}^{{ - \eta {{{\left( {\gamma - 1} \right)}}^{2}}}}}\).
After simple transformations, inequality (A2) is equivalent to inequality
Quantity \(\gamma (1 - \gamma )\) takes on a maximum value of 1/4 at 0 ≤ γ ≤ 1, so that the concavity condition is satisfied for any γ at 0 < η ≤ 2.
The η-mixability condition means that, for any distribution w = \(({{\omega }_{1}},...,{{\omega }_{N}})\) on a set of N experts and any forecasts f = \(({{f}_{1}},...,{{f}_{N}})\), we can find γ* for which inequalities
are satisfied at ω = 0, 1. Points \(\left( {{{e}^{{ - \eta \lambda ({{f}_{i}},0)}}},{{e}^{{ - \eta \lambda ({{f}_{i}},1)}}}} \right)\) (i = 1, …, N) belong to curve (A1), and their convex combination (point M) lies inside the convex region bounded by such a curve. Condition (A3) means that the abscissa and ordinate of point N = \(({{e}^{{ - \eta \lambda (\gamma ^{ *},0)}}},{{e}^{{ - \eta \lambda (\gamma ^{ *},1)}}})\) are no less than the abscissa and ordinate of point M. We search for point N. The line passing through point M marks point N = \(({{e}^{{ - \eta \lambda (\gamma ^{ *},0)}}},{{e}^{{ - \eta \lambda (\gamma ^{ *},1)}}})\) on curve (A1) (see Fig. 1). Forecast γ* is calculated from the condition
For a quadratic loss function, equality (A4) yields the following expression for γ*:
The mixability condition for the quadratic loss function makes it possible to use η = 2.
It can be easily shown that, for any ω ∈ [0, 1], function f(γ) = \({{e}^{{ - \eta {{{(\gamma - \omega )}}^{2}}}}}\) is concave with respect to γ ∈ [0, 1] for 0 < η < 1/2. In this case, quantity γ* = \(\sum\nolimits_{i = 1}^N {{{w}_{i}}{{f}_{i}}} \) satisfies inequality (A3) at any 0 < η < 1/2 in accordance with the definition of concavity.
Rights and permissions
About this article
Cite this article
V’yugin, V.V., Trunov, V.G. Online Aggregation of Probabilistic Forecasts Based on the Continuous Ranked Probability Score. J. Commun. Technol. Electron. 65, 662–676 (2020). https://doi.org/10.1134/S1064226920060285
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064226920060285