当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interval forecasts based on regression trees for streaming data
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2019-12-18 , DOI: 10.1007/s11634-019-00382-7
Xin Zhao , Stuart Barber , Charles C. Taylor , Zoka Milan

In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.



中文翻译:

基于回归树的间隔预测以进行流数据

在预测中,我们经常需要间隔预测,而不仅仅是特定点的预测。为了有效地跟踪流数据,此间隔预测应可靠地覆盖观察到的数据,但应尽可能地窄。为此,我们提出了两种基于回归树的方法:一种是集成方法,另一种是基于单棵树的方法。对于集成方法,我们使用最新模型的加权结果;对于单树方法,我们保留一个模型,直到有必要训练新模型为止。我们提出了一种新方法,该方法可使用根据最新数据批次计算出的均方根预测误差来自适应地更新间隔预测。我们使用小波变换的数据为基础回归树模型捕获长时间变量信息和条件推理树。结果表明,这两种方法性能良好,覆盖范围广,且间隔不会过宽。当基础数据生成机制发生变化时,它们的性能最初会受到影响,但随着时间的推移会相对快速地恢复。与集成方法相比,基于单个树的方法在计算(CPU)时间上表现最佳。与ARIMA和GARCH建模相比,我们的方法可实现更好或相似的覆盖范围和宽度,但所需的CPU时间要少得多。

更新日期:2020-04-20
down
wechat
bug