当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Selego: robust variate selection for accurate time series forecasting
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-07-28 , DOI: 10.1007/s10618-021-00777-1
Manoj Tiwaskar 1 , Yash Garg 1 , Xinsheng Li 1 , K. Selçuk Candan 1 , Maria Luisa Sapino 2
Affiliation  

Naïve extensions of uni-variate prediction techniques lead to an unwelcome increase in the cost of multi-variate model learning and significant deteriorations in the model performance. In this paper, we first argue that (a) one can learn a more accurate forecasting model by leveraging temporal alignments among variates to quantify the importance of the recorded variates with respect to a target variate. We further argue that, (b) for this purpose we need to quantify temporal correlation, not in terms of series similarity, but in terms of temporal alignments of key “events” impacting these series. Finally, we argue that (c) while learning a temporal model using recurrence based techniques (such as RNN and LSTM—even when leveraging attention strategies) is difficult and costly, we can achieve better performance by coupling simpler CNNs with an adaptive variate selection strategy. Relying on these arguments, we propose a Selego framework (Selego is a word of latin origin meaning “selection”) for variate selection and experimentally evaluate the performance of the proposed approach on various forecasting models, such as LSTM, RNN, and CNN, for different top-X% variates and different forecasting time in the future (lead) on multiple real-world datasets. Experiments show that the proposed framework can offer significant (\(90-98\%\)) drops in the number of recorded variates that are needed to train predictive models, while simultaneously boosting accuracy.



中文翻译:

Selego:用于准确时间序列预测的稳健变量选择

单变量预测技术的幼稚扩展导致多变量模型学习成本的不受欢迎增加和模型性能的显着恶化。在本文中,我们首先认为 (a) 可以通过利用时间对齐来学习更准确的预测模型在变量之间量化记录变量相对于目标变量的重要性。我们进一步认为,(b)为此目的,我们需要量化时间相关性,不是根据系列相似性,而是根据影响这些系列的关键“事件”的时间对齐。最后,我们认为 (c) 虽然使用基于递归的技术(例如 RNN 和 LSTM——即使在利用注意力策略时)学习时间模型既困难又昂贵,但我们可以通过将更简单的 CNN 与自适应变量选择策略耦合来获得更好的性能. 基于这些论点,我们提出了一个Selego 框架(Selego 是一个拉丁词源,意思是“选择”) 用于变量选择并通过实验评估所提出的方法在各种预测模型上的性能,例如 LSTM、RNN 和 CNN,对于不同的 top-X% 变量和未来不同的预测时间(领先)在多个现实世界中数据集。实验表明,所提出的框架可以显着减少 ( \(90-98\%\) ) 训练预测模型所需的记录变量数量,同时提高准确性。

更新日期:2021-07-28
down
wechat
bug