Elsevier

International Journal of Forecasting

Volume 38, Issue 1, January–March 2022, Pages 193-208
International Journal of Forecasting

Combining forecasts for universally optimal performance

https://doi.org/10.1016/j.ijforecast.2021.05.004Get rights and content

Abstract

There are two potential directions of forecast combination: combining for adaptation and combining for improvement. The former direction targets the performance of the best forecaster, while the latter attempts to combine forecasts to improve on the best forecaster. It is often useful to infer which goal is more appropriate so that a suitable combination method may be used. This paper proposes an AI-AFTER approach that can not only determine the appropriate goal of forecast combination but also intelligently combine the forecasts to automatically achieve the proper goal. As a result of this approach, the combined forecasts from AI-AFTER perform well universally in both adaptation and improvement scenarios. The proposed forecasting approach is implemented in our R package AIafter, which is available at https://github.com/weiqian1/AIafter.

Introduction

In many forecasting problems, the analyst has access to several different forecasts of the same response series. These forecasts might arise from models of known structure to the analyst or they may be generated by unknown mechanisms. To utilize these candidate forecasts to accurately predict future response values, the analyst can either select one of the forecasting procedures that seems to perform well or combine the candidate forecasters in some way.

Much work has been devoted to the merits of forecast combination that may take both frequentist and Bayesian-type approaches for various prediction and forecasting scenarios. Many frequentist methods design and employ performance-based criteria to estimate theoretically optimal weights for linear combination to seek forecasting performance potentially superior to any of the individual candidate forecasts. The early classical work of Bates and Granger (1969) proposed the minimization of mean square error criteria to pursue the optimal weights estimated through forecast error variance matrix, and Granger and Ramanathan (1984) formulated different linear regression frameworks for the optimal weights. Adopting minimization of other performance measures like forecast cross-validation and information criteria, weighted averaging strategies have been specifically developed for certain classes of known statistical forecasting and prediction models such as factor models (Cheng & Hansen, 2015), generalized linear models (Zhang et al., 2016), spatial autoregressive models (Zhang & Yu, 2018), among many useful others. With the popularity of performance-based combination methods, it is also well-known that estimated weights from sophisticated methods can often deviate much away from the targeted theoretical optimal weights due to weight estimation error and uncertainty (e.g., Claeskens et al., 2016, Smith and Wallis, 2009); this important and well-studied factor, among other possible factors such as structural break and new information (Lahiri et al., 2017, Qian et al., 2019b), contributes to the phenomenon of forecast combination puzzle (Hendry and Clements, 2004, Stock and Watson, 2004a) that in practice, simple equally-weighted averaging or its variants may outperform sophisticated alternatives.

Different from the combining strategies that directly aim to obtain the optimal weights to potentially improve on all the candidate forecasts, a class of aggregation methods recursively updates the combining weights through online re-weighting schemes (see, e.g., Lahiri et al., 2017, Yang, 2004), and these methods typically take a less ambitious objective and only aim to match the performance of the best candidate forecast, with the promise of having a smaller cost for weight estimation. Indeed, Yang (2004) studied a representative of these methods called AFTER and showed non-asymptotic forecast risk bounds that illustrate theoretically the heavier cost in forecast risk from attempting to achieve improved forecasts than only aiming to match the best-performing original candidate forecast; Lahiri et al. (2017) showed asymptotically that AFTER tends to impose all unit weights to the best-performing candidate under some mild conditions. Lahiri et al. (2015) also suggested that the less aggressive combining objective is reasonable in consideration of forecast uncertainty under some appropriate measures. Besides the aforementioned frequentist approaches, Bayesian model averaging methods (Forte et al., 2018, Hoeting et al., 1999a, Steel, 2011) are also in alignment to adapt to the best original candidate forecast since the data generating process is often assumed to be one of the candidate forecasting models (De Luca et al., 2018), while these models are required to be known. Although different combination methods have been designed with different objectives, as neither information on data generating processes nor candidate forecasts’ underlying statistical models are necessarily available in practice, it is usually unknown a priori to an analyst whether it is feasible to achieve improved forecast combination performance over all the original candidates, which could lead to improperly choosing combination methods and undesirable forecasting performance; for example, blindly applying the AFTER method is expected to be under-performing if a linear combination of forecasts via optimal weighting schemes much outperforms each original candidate.

Despite ongoing issues on which combining methods to use and how to develop new combining methods for better performance, there has been a consensus from existing empirical and theoretical research that a combination of available candidate forecasts carries important benefits and often produces better results than the selection of a single forecast candidate (e.g., Kourentzes et al., 2019, Stock and Watson, 2003, Yang, 2003). In a recent open forecast competition named the M4 competition (Makridakis et al., 2020), 100,000 time series were provided as a large-scale testing ground to assess the performance of forecasting methods, and over sixty research teams submitted their forecasting results based on various methods for principled evaluation; notably, the benefits of forecast combination over selection have been re-confirmed as one of the main conclusions in the highlight results of the M4 competition (Makridakis et al., 2018). The following two reasons can partially contribute to the benefits. First, identifying which forecasting procedure is the best among the candidates often involves substantial uncertainty. Depending on the noise that is realized in the forecast evaluation period, several different candidates may have a good chance of being selected as the best one by a selection procedure, and the winner-takes-all approach of forecast selection often results in post-selection forecasts having high variance. Second, different pieces of predictive information may be available to different forecasters; in this situation, a combination of the candidate forecasts has the potential to outperform even the best original candidate procedure due to the sharing of information from combining the forecasts.

These two benefits of combining forecasts are closely related to the two different objectives of combining methods we briefly discussed earlier. Yang (2004) formally distinguishes these two objectives as combining for adaptation (CFA) and combining for improvement (CFI), where CFA targets the first benefit and CFI targets the second. Specifically, the objective of combining for adaptation is to achieve the best performance among the original candidate procedures while reducing the variance introduced by selection uncertainty. Combining for improvement, on the other hand, aims to improve on even the best procedure in the original candidate set by searching for an optimal combination of the candidates and directly estimating the optimal weights using performance-based criteria. Either of the two goals of forecast combination may be favored, depending on the nature of the data-generating process and the candidate forecasts. Taking an approach for adaptation when improvement is more appropriate can lead to missed opportunities to share information and improve on the original candidate forecasts; on the other hand, taking an approach for improvement when adaptation is more appropriate may result in elevated forecast risk (defined in Section 2.1) from the heavier cost of the more ambitious CFI objective. A more detailed explanation with heuristic illustration on this issue will be given in Section 2.

In this paper, we intend to highlight the connection and relationship between CFA and CFI with a testing procedure to assess whether we can improve on the best individual forecaster and then design a new method to capture the benefits of combining under either goal given the data and candidate forecasts at hand. Specifically, our proposal is based on the AFTER method (Yang, 2004), which is particularly designed for the combining goal of adaptation, since it is guaranteed to perform nearly as well as (rather than significantly improving on) the best candidate forecasting procedure, without knowing in advance which procedure is best. However, what if all the original candidate forecasters perform poorly? In many situations, it is still possible to combine these candidates (sometimes referred to as “weak learners”) into a forecast to improve over even the best original candidate and attain the goal of combining for improvement. Correspondingly, we propose an important extension of the AFTER method to combine for both adaptation (A) and improvement (I) so that it can also adapt to the goal for improvement when such a strategy has evident potential; for brevity, we will call the proposed method AI-AFTER. We also implement the proposed method in a user-friendly R package named AIafter, which is available at https://github.com/weiqian1/AIafter https://github.com/weiqian1/AIafter.

Although this paper is focused on point forecast, it is worth noting that significant progress has been made in the literature for the important topics on probability/density forecast combination and interval/quantile forecast combination (Granger et al., 1989, Wallis, 2005). In particular, the density/probability forecast combination methods generally need assumptions on statistical modeling forms for candidate forecasts or data generating processes. For example, Clements and Harvey (2011) established the optimal combination forms under several plausible data generating processes and studied the associated estimation of optimal weights under certain performance-based criteria. Following earlier work of Hall and Mitchell (2007), Geweke and Amisano (2011) considered optimal weighting schemes for a linear combination of candidate predictive model densities and showed that in contrast to Bayesian model averaging, their method with optimal weights based on predictive scoring rules intends to achieve performance substantially better than any candidate predictive model. On the other hand, interval/quantile forecast combination methods (Granger et al., 1989) consider time-varying heteroscedastic forecast errors whose non-i.i.d. distributions can be estimated flexibly with different parametric or nonparametric density estimation approaches, and they do not require an analyst to have prior knowledge of either statistical models or their parameters that lead to any of the candidate forecasts. For example, Trapero et al. (2019) proposed to obtain optimal combination through minimization of quantile loss function criterion, while Shan and Yang (2009) proposed an AFTER-type method for quantile forecast combination that can perform nearly as well as the best original candidate. In this sense, the discussion and proposal of this paper on addressing the two different objectives of CFA and CFI for point forecast combination may be also relevant to the interval/quantile forecast combination, which should deserve separate careful study in its own right. We leave the interesting yet challenging extension beyond the point forecast for the interval/quantile forecast combination to future investigation.

The remainder of the paper is structured as follows. In Section 2 we formally define the combining goals of adaptation and improvement, and we present illustrative simulations and examples that favor either of the combining objectives. Section 3 describes a statistical test for the potential to combine for improvement. The AI-AFTER method of combining forecasts for either adaptation or improvement is described in Section 4. Simulation and real data evaluation are given in Section 5 and Section 6, respectively. Section 7 gives brief concluding remarks.

Section snippets

Two objectives of forecast combination

The objectives of combining for adaptation/improvement (CFA vs. CFI) are understood in terms of the forecast risks they aim to achieve. We will see that the target risk of combining for improvement (that is, the risk using the optimal weights for combining) is always upper bounded by the target risk of combining for adaptation (that is, the risk of the best original candidate forecast). On the other hand, because of the extra higher cost in forecast risk that can be introduced by combining for

Our approach

We approach the potential of improvement from a hypothesis testing framework. As discussed before, the choice to combine for adaptation is a relatively conservative strategy. Therefore, we place CFA in the role of the null hypothesis and CFI as the alternative. The test recommends combining for improvement if the data provide evidence that CFI is a potentially useful strategy.

The choice to combine for improvement makes sense if, for the available n, R(ΨL;n) is significantly less than R(Ψ0;n) so

Using AFTER algorithm

The hypothesis test described in Section 3 indicates whether, given the data and candidate forecasts at hand, one should perform combining for improvement. If there is little evidence that any combined forecast can outperform the best individual forecast, then one may simply target the forecast risk of the best individual forecaster. As AFTER can provide protection against model selection uncertainty (Zou & Yang, 2004) and automatically adapt to changes over time in the data generating process

Simulation studies

In this section, we present simulation results for a linear regression setting and for a time series setting, followed by re-visiting the two simulations of Sections Section 2.2. In the linear regression setting, a large number of covariates help to determine the data generating process and are considered by M different candidate models. In the time series setting, past values of the response variable are used in the candidate models in addition to one or two covariates. In both settings, we

Data, forecasts, and combining methods

We next apply the method of AI-AFTER to forecast two measures of output growth for seven developed countries using data first analyzed in Stock and Watson (2003). Specifically, we forecast Yt+4h=100hln(Qt+4h/Qt),where, depending on the analysis, Q is either a country’s real GDP (RGDP) or Index of Industrial Production (IP), t represents the current quarter at the time of forecasting, and h represents the forecasting horizon in terms of the number of years ahead. We consider forecasts for h=1

Discussion

This work introduces a forecast combining approach, AI-AFTER, that performs well universally in both adaptation and improvement scenarios. By treating methods that attempt to combine for improvement, such as regression-based forecasts, as candidates to be considered and using a hypothesis test to detect underlying forecast scenario, AI-AFTER adapts to the situation at hand to be aggressive or conservative as appropriate based on data and forecast candidates.

So far, our work has focused on the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

Qian’s research is partially supported by U.S. NSF grant DMS-1916376 and JPMC Faculty Fellowship. We would like to thank the Editor, Associate Editor and two anonymous referees for their valuable comments that help to improve this manuscript significantly.

Supplementary materials

AIafter  The R package implementing our proposed AI-AFTER forecasting algorithm, together with the AFTER, BG, LR, and CLR methods, is available at the GitHub address: https://github.com/weiqian1/AIafter.

References (42)

  • MakridakisS. et al.

    The M4 Competition: Results, findings, conclusion and way forward

    International Journal of Forecasting

    (2018)
  • MakridakisS. et al.

    The M4 Competition: 100,000 time series and 61 forecasting methods

    International Journal of Forecasting

    (2020)
  • TraperoJ.R. et al.

    Quantile forecast optimal combination to enhance safety stock estimation

    International Journal of Forecasting

    (2019)
  • ZhangX. et al.

    Spatial weights matrix selection and model averaging for spatial autoregressive models

    Journal of Econometrics

    (2018)
  • ZouH. et al.

    Combining time series models for forecasting

    International Journal of Forecasting

    (2004)
  • BatesJ.M. et al.

    The combination of forecasts

    Operation Research Quarterly

    (1969)
  • BreimanL.

    Bagging predictors

    Machine Learning

    (1996)
  • DieboldF.X. et al.

    Comparing predictive accuracy

    Journal of Business & Economic Statistics

    (1995)
  • ForteA. et al.

    Methods and tools for Bayesian variable selection and model averaging in normal linear regression

    International Statistical Review

    (2018)
  • FreundY. et al.

    A desicion-theoretic generalization of on-line learning and an application to boosting

  • GrangerC.W.J. et al.

    Improved methods of combining forecasts

    Journal of Forecasting

    (1984)
  • Cited by (0)

    View full text