Distance variable improvement of time-series big data stream evaluation,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distance variable improvement of time-series big data stream evaluation
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-10-13 , DOI: 10.1186/s40537-020-00359-w
Ari Wibisono , Petrus Mursanto , Jihan Adibah , Wendy D. W. T. Bayu , May Iffah Rizki , Lintang Matahari Hasani , Valian Fil Ahli

Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.

中文翻译：

时序大数据流评估的距离变量改进

对包含时间序列数据的大型数据集进行实时信息挖掘是一项非常具有挑战性的任务。为此，我们建议使用平均距离和标准偏差通过漂移检测（FIMT-DD）算法来提高现有快速增量模型树的准确性。标准FIMT-DD算法使用Hoeffding界限作为其划分标准。我们建议进一步使用平均距离和标准偏差，它们比标准方法更准确地分割树。我们使用包含4,000,000个实例的大型流量需求数据集验证了我们提出的方法。Tennet的大型风力发电厂数据集，包含435,268个实例；道路天气数据集，包含30,000,000个实例。结果表明，与标准方法和切尔诺夫界方法相比，我们提出的FIMT-DD算法提高了精度。测得的误差表明，我们的方法在每个学习阶段均产生的平均绝对百分比误差（MAPE）较Chernoff Bound方法降低了约2.49％，与标准方法相比降低了19.65％。

更新日期：2020-10-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文