A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Giao, Bui Cong; Vinh, Phan Cong

doi:10.1007/s11036-022-01988-6

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Published: 11 June 2022

Volume 27, pages 2084–2097, (2022)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Bui Cong Giao¹ &
Phan Cong Vinh²

224 Accesses
1 Citation
Explore all metrics

Abstract

Similarity search in streaming time series is a challenging problem due to tight requirements in processing streaming data and replying feedback, e.g., quickly processing a time-series stream of high speed, and accurately replying found results to a query system. These difficulties urge researchers of time-series data mining to have a framework at hand for building systems of similarity search in streaming time series based on a platform specializing in handling streaming data. In the paper, we introduce a framework of similarity search in streaming time series based on Spark Streaming. Subsequently, a prototype system implementing the framework would be proposed to demonstrate the feasibility of the framework for building similarity search systems which can work efficiently and effectively in streaming context. In addition, the prototype system takes advantages of SUCR-DTW to perform similarity search efficiently in streaming environment under Dynamic Time Warping. The experimental results obtained from the prototype system demonstrate that the Spark job of similarity search in streaming time series is accomplished quickly and accurately. The subsequences of streaming time series, which are similar to predefined queries, are found in near real time. They are the same as those obtained from the execution of similarity search in streaming time series by another reference system. Furthermore, the prototype system has high scalability, stably works while processing time-series streams of high steady rate. These experimental results also underline the value of the combination of Spark Streaming and SUCR-DTW to handle the challenging problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Article Open access 11 March 2016

Bui Cong Giao & Duong Tuan Anh

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Article 02 December 2023

Haowen Zhang & Jing Li

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available because the data are parts of the results of the CS2020-19 project funded by SGU. All results generated by the project are managed by and belong to the funder.

References

The Apache Software Foundation (2018) Spark streaming. https://spark.apache.org/streaming/. Accessed 01 June 2020
Zhang X, Qian Z, Shen S, Shi J, Wang S (2019) Streaming massive electric power data analysis based on Spark Streaming. In: Proceedings of international conference on database systems for advanced applications, pp 200–212, DOI https://doi.org/10.1007/978-3-030-18590-9_14
Paolis D, Tommaso L, Luca VD, Paiano R (2018) Sensor data collection and analytics with thingsboard and spark streaming. In: Proceedings of 2018 IEEE workshop on environmental, energy, and structural monitoring systems (EESMS), pp 1–6, DOI https://doi.org/10.1109/EESMS.2018.8405822, (to appear in print)
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, USA, pp 359–370
Giao B C, Anh D T (2016) Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization. Vietnam J Comput Sci 3(3):181–196. https://doi.org/10.1007/s40595-016-0062-4
Article Google Scholar
Luo W, Li Y, Yao F, Wang S, Li Z, Zhan P, Li X (2021) Multi-resolution representation for streaming time series retrieval. Int J Pattern Recog Artif Intell 35(06):2150019. https://doi.org/10.1142/S0218001421500191
Article Google Scholar
Zhan P, Sun C, Hu Y, Luo W, Zheng J, Li X (2020) Feature-based online representation algorithm for streaming time series similarity search. Int J Pattern Recog Artif Intell 34(05):2050010. https://doi.org/10.1142/S021800142050010X
Article Google Scholar
Keogh E, Smyth P (1997) A probabilistic approach to fast pattern matching in time. In: Proceedings of third international conference knowledge discovery and data mining, vol 97. AAAI Press, 1997, California, USA, pp 24–30
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–162, DOI https://doi.org/10.1145/375663.375680
Aggarwal CC, Philip SY, Han J, Wang J (2003) A framework for clustering evolving data streams. In: Proceedings of 2003 VLDB Conference, pp 81–92, DOI https://doi.org/10.1016/B978-012722442-8/50016-1
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Seri C (Appl Stat) 28(1):100–108. https://doi.org/10.2307/2346830
Article MATH Google Scholar
Ziehn A, Charfuelan M, Hemsen H, Markl V (2019) Time series similarity search for streaming data in distributed systems. In: Workshops of the EDBT/ICDT 2019 Joint Conference (EDBT/ICDT 2019), Lisbon, Portugal
The Apache Software Foundation (2014) Apache Flink. https://flink.apache.org/. Accessed 01 Sept 2021
Ding Y, Luo W, Zhao Y, Li Z, Zhan P, Li X (2019) A novel similarity search approach for streaming time series. J Phys Conf Ser 1302(2):022084. https://doi.org/10.1088/1742-6596/1302/2/022084
Article Google Scholar
Oregi I, Péres A, Ser DJ, Lozano JA (2017) On-line Dynamic Time Warping for streaming time series. In: Joint european conference on machine learning and knowledge discovery in databases, pp 591–605, DOI https://doi.org/10.1007/978-3-319-71246-8_36
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sign Process 26(1):43–49. https://doi.org/10.1109/TASSP.1978.1163055
Article MATH Google Scholar
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under Dynamic Time Warping. In: Proceedings of The 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’12), pp 262–270, DOI https://doi.org/10.1145/2339530.2339576
The Apache Software Foundation (2018) Apache Spark. https://spark.apache.org/. Accessed 01 June 2020
The Apache Software Foundation (2008) Apache YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 01 Sept 2020
The Apache Software Foundation (2012) Apache Mesos. http://mesos.apache.org/. Accessed 01 Sept 2020
The Apache Software Foundation (2006) Apache Hadoop. https://hadoop.apache.org/. Accessed 01 Sept 2020
The Apache Software Foundation (2009) Apache Flume. https://flume.apache.org/. Accessed 01 Sept 2020
The Apache Software Foundation (2017) Apache Kafka. https://kafka.apache.org/. Accessed 01 Sept 2020
Gupta G (2015) Learning real-time processing with Spark Streaming. Packt Publishing Ltd, Birmingham B3 2PB, UK
Google Scholar
The Apache Software Foundation (2004) Apache Derby. https://db.apache.org/derby/. Accessed 01 Sept 2020
West M (2021) Time-series data. http://www2.stat.duke.edu/~mw/mwsoftware/moredata/ts_data. Accessed 01 Sept 2021
Weigend AS (2016) SantaFe Time Series. http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Accessed Dec 2016
Group MP (2016) Datasets relate to the operation of the electricity market. http://ftp.emi.ea.govt.nz/Datasets/. Accessed Dec 2016

Download references

Funding

This research is funded by Saigon University (SGU) under grant number CS2020-19.

Author information

Authors and Affiliations

Faculty of Electronics and Telecommunications, Saigon University, Ho Chi Minh City, Vietnam
Bui Cong Giao
Faculty of Information Technology, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
Phan Cong Vinh

Authors

Bui Cong Giao
View author publications
You can also search for this author in PubMed Google Scholar
Phan Cong Vinh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Bui Cong Giao mainly wrote the paper, implemented and experimented the framework. Phan Cong Vinh contributed in framework design and paper proofread.

Corresponding author

Correspondence to Bui Cong Giao.

Ethics declarations

Ethics approval

Not Applicable.

Conflict of interests

The authors declare that there are no conflicts of interest regarding the publication of this paper

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giao, B.C., Vinh, P.C. A Framework for Similarity Search in Streaming Time Series based on Spark Streaming. Mobile Netw Appl 27, 2084–2097 (2022). https://doi.org/10.1007/s11036-022-01988-6

Download citation

Accepted: 04 April 2022
Published: 11 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11036-022-01988-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Abstract

Access this article

Similar content being viewed by others

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Abstract

Access this article

Similar content being viewed by others

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation