LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network
Information Sciences Pub Date : 2020-05-23 , DOI: 10.1016/j.ins.2020.05.042
Zheng Chu , Jiong Yu , Askar Hamdulla

In recent years, the volume and velocity of streaming data have been increasing rapidly. Thus, real-time processing scenarios for streaming data have continued to increase. Stream processing tasks face huge challenges in areas such as load optimization, task scheduling, and resource management. Throughput prediction for stream processing tasks is a key technology in these areas. To predict the throughput of stream processing tasks accurately and efficiently, we propose a novel model named the LPG-model. It includes three main components: a light gradient boosting machine (LightGBM), incremental principal component analysis (IPCA), and an evolving deep gated recurrent unit (GRU) network. Unlike existing state-of-the-art models, the LPG-model not only offers a network structure adaptation mechanism (hidden layer adaptation mechanism), but also provides feature processing mechanisms for streaming data. Data preprocessing provides an interpolation method for missing values through an incremental interpolation mechanism and two normalization methods for features through incremental normalization mechanisms. An efficient dimensionality reduction mechanism provided by the LightGBM and IPCA is used to improve the prediction efficiency of the LPG-model. The hidden layer growing mechanism of the evolving deep GRU network is capable of learning new knowledge and maintaining previous knowledge from data streams. Moreover, it also has the ability to capture the temporal aspects of the data streams. The experimental results from four open-source benchmarks illustrate that the LPG-model is more accurate and efficient than state-of-the-art algorithms or networks, under the prequential test-then-train protocol. This proves the effectiveness of the LPG-model in throughput prediction scenarios for stream processing tasks. Furthermore, the numerical results from standard benchmark problems of data streams indicate that the LPG-model has potential to reduce the execution time of high-dimensional data streams with a high classification accuracy.

中文翻译：

LPG模型：使用光梯度增强机，增量主成分分析和深度门控递归单元网络进行流处理中吞吐量预测的新模型

近年来，流数据的数量和速度一直在迅速增长。因此，用于流数据的实时处理方案持续增加。流处理任务在负载优化，任务调度和资源管理等领域面临巨大挑战。流处理任务的吞吐量预测是这些领域的关键技术。为了准确，高效地预测流处理任务的吞吐量，我们提出了一个名为LPG模型的新型模型。它包括三个主要组件：一个光梯度增强机（LightGBM），增量主成分分析（IPCA）和一个不断发展的深门控循环单元（GRU）网络。与现有的最新模型不同，LPG模型不仅提供了网络结构自适应机制（隐藏层自适应机制），而且还提供用于流数据的特征处理机制。数据预处理通过增量插值机制为缺失值提供插值方法，并通过增量归一化机制为特征提供两种归一化方法。LightGBM和IPCA提供的有效降维机制可用于提高LPG模型的预测效率。不断发展的深度GRU网络的隐藏层增长机制能够学习新知识并维护数据流中的先前知识。此外，它还具有捕获数据流的时间方面的能力。来自四个开源基准测试的实验结果表明，LPG模型比最新的算法或网络更准确，更高效，在先行测试然后训练协议下进行。这证明了LPG模型在用于流处理任务的吞吐量预测方案中的有效性。此外，来自数据流标准基准问题的数值结果表明，LPG模型具有以高分类精度减少高维数据流执行时间的潜力。

更新日期：2020-05-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11