当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive watermark generation mechanism based on time series prediction for stream processing
Frontiers of Computer Science ( IF 3.4 ) Pub Date : 2021-07-18 , DOI: 10.1007/s11704-020-0206-7
Yang Song 1, 2 , Yunchun Li 1, 2 , Hailong Yang 1, 2 , Wei Li 1, 2 , Jun Xu 3 , Zerong Luan 4
Affiliation  

The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time. In reality, streaming data usually arrives out-of-order due to factors such as network delay. The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness. Watermark is a special kind of data inserted into the data stream with a timestamp, which helps the framework to decide whether the data received is late and thus be discarded. Traditional watermark generation strategies are periodic; they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy. This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation. This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy. We implement the proposed mechanism on top of Flink and evaluate it with real-world datasets. The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.



中文翻译:

基于时间序列预测的流处理自适应水印生成机制

数据流处理框架基于事件时间对流数据进行处理,以保证请求能够得到实时响应。实际上,由于网络延迟等因素,流数据通常会乱序到达。数据流处理框架普遍采用水印机制来解决数据的无序问题。水印是一种特殊的数据,带有时间戳插入到数据流中,它帮助框架判断接收到的数据是否迟到从而被丢弃。传统的水印生成策略是周期性的;他们无法动态调整水印分布以平衡响应性和准确性。针对上述局限性,本文提出了一种基于时间序列预测模型的自适应水印生成机制。该机制利用数据流的无序数据率和其他延迟特性动态调整水印分布的频率和时序,以提高系统响应能力,同时确保可接受的结果准确性。我们在 Flink 之上实现了所提出的机制,并使用真实世界的数据集对其进行评估。实验结果表明,我们的机制在系统响应性和结果准确性方面均优于现有的水印分配策略。

更新日期:2021-07-19
down
wechat
bug