The ensemble approach to forecasting: A review and synthesis

https://doi.org/10.1016/j.trc.2021.103357Get rights and content

Highlights

  • Review and synthesize methods of ensemble forecasting with a unifying framework.

  • As decision support tools, ensemble models systematically account for uncertainties.

  • Ensemble methods can include combining models, data, and ensemble of ensembles.

  • Transport ensemble models have the potential for improving accuracy and reliability.

Abstract

Ensemble forecasting is a modeling approach that combines data sources, models of different types, with alternative assumptions, using distinct pattern recognition methods. The aim is to use all available information in predictions, without the limiting and arbitrary choices and dependencies resulting from a single statistical or machine learning approach or a single functional form, or results from a limited data source. Uncertainties are systematically accounted for. Outputs of ensemble models can be presented as a range of possibilities, to indicate the amount of uncertainty in modeling. We review methods and applications of ensemble models both within and outside of transport research. The review finds that ensemble forecasting generally improves forecast accuracy, robustness in many fields, particularly in weather forecasting where the method originated. We note that ensemble methods are highly siloed across different disciplines, and both the knowledge and application of ensemble forecasting are lacking in transport. In this paper we review and synthesize methods of ensemble forecasting with a unifying framework, categorizing ensemble methods into two broad and not mutually exclusive categories, namely combining models, and combining data; this framework further extends to ensembles of ensembles. We apply ensemble forecasting to transport related cases, which shows the potential of ensemble models in improving forecast accuracy and reliability. This paper sheds light on the apparatus of ensemble forecasting, which we hope contributes to the better understanding and wider adoption of ensemble models.

Introduction

The concept underlying ensemble forecasting has existed since time immemorial, such that most cultures have expressions for the ‘wisdom of the crowd’ in their language. In many disciplines and applications, however, modeling still relies on a single model. Most transport models are theory-driven ‘Newtonian’ physical models (Garrison and Levinson, 2014), that are used to represent what is theorized to be the true mechanism, and models with lower performance are labeled simply as ‘incorrect’. The rationale for applying physical models in transport is the assumption that a ‘true’ model really exists, and can be described using a single mathematical expression.

The current performance of transport models are not satisfying, and there have been many cases of erroneous transport forecasts (Boyce and Williams, 2015). Prediction accuracy in other fields, most notably in weather forecasting, has improved significantly over the years. This discrepancy in model performance between transport and weather forecasting suggests that there might be some fundamental issues with tools used by transport modelers. While travel demand models (Meyer and Miller, 2001) experienced no major changes for over half a century since their introduction, weather forecasting benefited from an open mind to explore new methods in modeling, and daily opportunities to validate new models against observations (Blum, 2019), which resulted in the adoption of ensemble models as a standard practice, and a significant improvement in forecast accuracy.

Ensemble forecasting serves two purposes: combining information, and pooling errors. The combination of information is achieved by considering different model assumptions, and pattern recognition methods, so information extracted by different models are combined with an ensemble model. The pooling of error is achieved by consulting multiple sources. It is very difficult for complex systems, such as models and computer software, to be totally void of errors; but since different models/sources are more likely to recognize the same feature than to repeat the same error, it is common in weather forecasting to obtain a value from different models or software (Blum, 2019, Silver, 2012). The same idea in pooling error is behind the 2014 Nobel Prize in chemistry, which was awarded to scientists using repeated imagery, that obtained resolution beyond the diffraction limit (Möckl et al., 2014)

This review and synthesis contributes to the literature with a wider scope than technical reviews of ensemble methods. Even in weather forecasting where ensemble methods originated, its ensemble methods are not comprehensive. This review covers both ensemble models that make a single simultaneous prediction, and iterative models that use model outputs as new inputs, where forecast uncertainties resulting from initial condition and accumulated error (i.e. chaos theory) tends to accumulate. These two types of ensemble models are generally discussed separately in different contexts, because historically, chaos theory and error accumulation were more specific to weather forecasting. The combination of different model formulations and assumptions are not a major interest in weather forecasting. However, these two types of ensemble models are both relevant to transport related cases, therefore the broad scope of this review is necessary for transport application of ensemble models. We also review ensemble methods used in different disciplines, which includes the use of expert opinions, judgmental adjustments to model predictions, and meta-analysis, covering both methodical, and what would be considered empirical, ensemble methods.

Ensemble forecasting combining different sources of uncertainties provides an alternative to the conventional modeling approach. Ensemble forecasting was perhaps first used in weather forecasts (Blum, 2019), and is intended to extract more information out of available data, and to incorporate uncertainties in modeling. The resulting ensemble models have higher accuracy, better reliability, and produce model outputs that are more useful as decision support tools. The defining characteristic of ensemble models is the combination of outputs from different models, and data from different sources. Philosophically this combination of data and models constitutes an aggregation of information, since different models can extract different pieces of information embedded within the data (Winkler, 1989); data from different sources also contain non-overlapping pieces of information, that can be combined by ensemble models.

Ensemble model outputs can include a range of possible outcomes from parallel base models, instead of a single number. Real-world events have many possibilities, to which models only provide an ‘estimate’ for what is likely to happen. In this light, different models rely on different assumptions, that provide different perspectives for prediction. The performance of models are measured in probabilities of being correct, so even the lowest performing model still has a small chance of being correct. The job of ensemble models is to incorporate these uncertainties into an ensemble forecast.

Transport modeling shares in the same uncertainties in data and in models as weather, economic, and political forecasting, and yet ensemble forecasting remains rare in transport. Through the use of ensemble models, different theories, different assumptions on data generation processes, and data from different sources with slight (or significant) variations can be combined to present multiple possibilities in an ensemble forecast. Ensemble forecasting provides an opportunity to improve transport models. In this paper we apply ensemble forecasting to a few transport related cases to test the performance of ensemble models.

There are ‘two cultures’ (Breiman et al., 2001) in modeling, namely theory-driven models, and data-driven models (terminology from Van Cranenburgh et al. (2021)). Theory-driven models have predetermined assumptions on the data generation process (e.g. linear, logit, etc.), and the model calibration aims to find parameters that suits the data. Data-driven models provide an alternative to theory-driven models, by not imposing assumptions on the data generation process in the same way that theory-driven models do, and instead focus on the data to extract and reproduce patterns in the data. The class of data-driven machine learning models can be further divided into generative and discriminatory modeling. The generative modeling attempts to learn the joint distribution of data, which can be used to generate new cases, and make predictions using Bayesian rules (Ng and Jordan, 2002); discriminatory modeling divides the data space, and discriminates cases directly based on explanatory variables. Each of the two cultures of modeling has its advantages, and with ensemble forecasting, these two cultures can be combined.

Methods of ensemble forecasting are highly siloed across disciplines, and often serve different purposes. For instance, in weather forecasting, ensemble models are used to plot possible paths of storms, and to dilute accumulated error over time; many other disciplines cite accuracy gain as the major motivation for using ensemble models. Bits and pieces of ensemble methods are used by different disciplines, without recognizing the big picture. The full potential of ensemble forecasting is not being realized, and many of its benefits do not cross disciplinary barriers. In this paper we review synthesize different parts of ensemble forecasting into a unified framework, with our addition of the ensembles of ensembles.

Section snippets

Review of ensemble methods

This section examines the literature for applications of ensemble models. The scope of this review covers ensemble models both within and outside of transport research, covering both methods of ensemble forecasting, and author-stated objectives of applying ensemble methods, to ascertain what types of ensemble models are in existence, and what these models are used for.

There are different levels of ensemble models. Degenerate forms of ensemble models that include only one model formulation are

Sources of forecast uncertainties

Uncertainties are unknown pieces of information not covered by forecast models. Models are best with the ‘known knowns’, which are strictly deterministic, and are better with risk (the ‘known unknowns’ to follow the Rumsfeld framework (Rumsfeld, 2011)) than uncertainty (the ‘unknown unknowns’), where risk can be quantified and measured but uncertainty cannot. Transport problems include significant uncertainties from various sources, which is the root cause of forecast errors. Different types of

Synthesis of ensemble methods

Within transport modeling, there are currently no systematic methods, or rules of thumb, for identifying suitable ensemble forecasting solutions for different scenarios. Ensemble forecasting generally does not work ‘out-of-the-box’, because each specific transport problem has its unique data availability, different sources of uncertainties, and requirements for forecast accuracy and reliability; different ensemble methods can also produce different types of ensemble model output, from a single

Application in transport related cases

In this section we test the performance of base models against ensemble models and ensemble of ensembles, in three transport related cases. Ensemble models generally require more data to calibrate than single models. Advances in technology and new methods of data collection provide continuous improvement in both the quantity and quality of data; the three transport related cases tested in this paper utilize recently available data that have sufficient quality, and enough data points in order to

Conclusion

Prevailing transport modeling practice relies heavily on finding the best single model, using that single model for forecasts, and presenting model outputs as a single number. This paper points out the folly in such practices, and summarizes problems with the single model procedure, as not considering real-world uncertainties in data generation mechanisms, measurement, and model specifications. The single model procedure also produces a gap between model outputs, which are deterministic, and

CRediT authorship contribution statement

Hao Wu: Conceptualization, Methodology, Writing - original draft. David Levinson: Supervision, Writing - review & editing.

References (96)

  • RenLiqun et al.

    An optimal neural network and concrete strength modeling

    Adv. Eng. Softw.

    (2002)
  • SangerTerence D.

    Optimal unsupervised learning in a single-layer linear feedforward neural network

    Neural Netw.

    (1989)
  • ServiziValentino et al.

    Stop detection for smartphone-based travel surveys using geo-spatial context and artificial neural networks

    Transp. Res. C

    (2020)
  • ShahriariM. et al.

    Using the analog ensemble method as a proxy measurement for wind power predictability

    Renew. Energy

    (2020)
  • WeiYu et al.

    Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks

    Transp. Res. C

    (2012)
  • WinklerRobert L.

    Combining forecasts: A philosophical basis and some current issues

    Int. J. Forecast.

    (1989)
  • WolpertDavid H.

    Stacked generalization

    Neural Netw.

    (1992)
  • XiaoYi et al.

    Application of multiscale analysis-based intelligent ensemble modeling on airport traffic forecast

    Transp. Lett.

    (2015)
  • XingYang et al.

    An ensemble deep learning approach for driver lane change intention inference

    Transp. Res. C

    (2020)
  • ZhouLigang et al.

    Least squares support vector machines ensemble models for credit scoring

    Expert Syst. Appl.

    (2010)
  • AlonsoWilliam

    Location and Land Use

    (1964)
  • AndaCuauhtemoc et al.

    Transport modelling in the age of big data

    Int. J. Urban Sci.

    (2017)
  • ArmstrongJ. Scott

    Combining forecasts

  • AshtonAlison Hubbard et al.

    Aggregating subjective forecasts: Some empirical results

    Manage. Sci.

    (1985)
  • Residential Property Transaction Data 2017 - 2019

    (2019)
  • BaconRobert W.

    Some evidence on the largest squared correlation coefficient from several samples

    Econometrica

    (1977)
  • BatesJohn M. et al.

    The combination of forecasts

    J. Oper. Res. Soc.

    (1969)
  • BlumAndrew
  • BoyceDavid E. et al.

    Forecasting Urban Travel: Past, Present and Future

    (2015)
  • BreimanLeo

    Stacked regressions

    Mach. Learn.

    (1996)
  • BreimanLeo

    Heuristics of instability and stabilization in model selection

    Ann. Statist.

    (1996)
  • BreimanLeo

    Statistical modeling: The two cultures

    Statist. Sci.

    (2001)
  • ChandNanak et al.

    A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection

  • ChengLong et al.

    Applying an ensemble-based model to travel choice behavior in travel demand forecasting under uncertainties

    Transp. Lett.

    (2019)
  • Transportation Network Providers - Trips

    (2019)
  • Cowgill, Bo, Dell’Acqua, Fabrizio, Deng, Samuel, Hsu, Daniel, Verma, Nakul, Chaintreau, Augustin, 2020. Biased...
  • DawesRobyn M.

    The robust beauty of improper linear models in decision making

    Am. Psychol.

    (1979)
  • Delle MonacheLuca et al.

    Probabilistic weather prediction with an analog ensemble

    Mon. Weather Rev.

    (2013)
  • The Ensemble Prediction System

    (2013)
  • ElliottGraham

    Averaging and the Optimal Combination of Forecasts

    (2011)
  • FamaEugene F.

    Efficient market hypothesis

    (1960)
  • GarrisonWilliam L. et al.

    The Transportation Experience: Policy, Planning, and Deployment

    (2014)
  • GustafssonNils

    Statistical issues in weather forecasting

    Scand. J. Stat.

    (2002)
  • HaiderMurtaza

    Diminishing returns to density and public transit

    Transp. Findings

    (2019)
  • HongLu et al.

    Groups of diverse problem solvers can outperform groups of high-ability problem solvers

    Proc. Natl. Acad. Sci.

    (2004)
  • JiAng et al.

    Injury severity prediction from two-vehicle crash mechanisms with machine learning and ensemble models

    IEEE Open J. Intell. Transp. Syst.

    (2020)
  • KahnemanDaniel

    Thinking, Fast and Slow

    (2011)
  • KangHeejoon

    Unstable weights in the combination of forecasts

    Manage. Sci.

    (1986)
  • Cited by (37)

    • A dynamic ensemble learning with multi-objective optimization for oil prices prediction

      2022, Resources Policy
      Citation Excerpt :

      Li et al. (2019a) provided a hybrid forecasting model with the variational mode decomposition and artificial intelligence methods for monthly oil price. Numerous research results have shown that the ensemble strategies have significant impact on the improvement of model prediction accuracy (Ahmad et al., 2021; Li et al., 2021b, 2021c; Liu et al., 2021; Wu and Levinson, 2021; Yu et al., 2022). At present, the commonly used integration methods can be divided into static (fixed weight) integration strategies and dynamic (variable weight) integration strategies (Alameer et al., 2020; Bueno et al., 2020; Chen and Liu, 2021).

    View all citing articles on Scopus
    View full text