A Framework for Similarity Search in Streaming Time Series based on Spark Streaming,Mobile Networks and Applications

当前位置： X-MOL 学术 › Mobile Netw. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming
Mobile Networks and Applications ( IF 2.3 ) Pub Date : 2022-06-11 , DOI: 10.1007/s11036-022-01988-6
Bui Cong Giao , Phan Cong Vinh

Similarity search in streaming time series is a challenging problem due to tight requirements in processing streaming data and replying feedback, e.g., quickly processing a time-series stream of high speed, and accurately replying found results to a query system. These difficulties urge researchers of time-series data mining to have a framework at hand for building systems of similarity search in streaming time series based on a platform specializing in handling streaming data. In the paper, we introduce a framework of similarity search in streaming time series based on Spark Streaming. Subsequently, a prototype system implementing the framework would be proposed to demonstrate the feasibility of the framework for building similarity search systems which can work efficiently and effectively in streaming context. In addition, the prototype system takes advantages of SUCR-DTW to perform similarity search efficiently in streaming environment under Dynamic Time Warping. The experimental results obtained from the prototype system demonstrate that the Spark job of similarity search in streaming time series is accomplished quickly and accurately. The subsequences of streaming time series, which are similar to predefined queries, are found in near real time. They are the same as those obtained from the execution of similarity search in streaming time series by another reference system. Furthermore, the prototype system has high scalability, stably works while processing time-series streams of high steady rate. These experimental results also underline the value of the combination of Spark Streaming and SUCR-DTW to handle the challenging problem.

中文翻译：

基于Spark Streaming的流时间序列相似性搜索框架

流时间序列中的相似性搜索是一个具有挑战性的问题，因为在处理流数据和回复反馈方面有严格的要求，例如快速处理高速时间序列流，并将找到的结果准确地回复到查询系统。这些困难促使时间序列数据挖掘的研究人员手头有一个框架，用于在专门处理流数据的平台上构建流时间序列中的相似性搜索系统。在本文中，我们介绍了一种基于 Spark Streaming 的流时间序列相似度搜索框架。随后，将提出一个实现该框架的原型系统，以证明该框架构建相似性搜索系统的可行性，该系统可以在流式上下文中高效地工作。此外，该原型系统利用SUCR-DTW在动态时间规整下的流环境中高效地进行相似性搜索。从原型系统获得的实验结果表明，Spark 在流时间序列中的相似性搜索工作能够快速准确地完成。流式时间序列的子序列类似于预定义的查询，可以近乎实时地找到。它们与通过另一个参考系统在流时间序列中执行相似性搜索获得的相同。此外，该原型系统具有很高的可扩展性，在处理高稳定速率的时间序列流时工作稳定。这些实验结果也强调了 Spark Streaming 和 SUCR-DTW 组合在处理具有挑战性的问题方面的价值。

更新日期：2022-06-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文