Skip to main content
Log in

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Similarity search in streaming time series is a challenging problem due to tight requirements in processing streaming data and replying feedback, e.g., quickly processing a time-series stream of high speed, and accurately replying found results to a query system. These difficulties urge researchers of time-series data mining to have a framework at hand for building systems of similarity search in streaming time series based on a platform specializing in handling streaming data. In the paper, we introduce a framework of similarity search in streaming time series based on Spark Streaming. Subsequently, a prototype system implementing the framework would be proposed to demonstrate the feasibility of the framework for building similarity search systems which can work efficiently and effectively in streaming context. In addition, the prototype system takes advantages of SUCR-DTW to perform similarity search efficiently in streaming environment under Dynamic Time Warping. The experimental results obtained from the prototype system demonstrate that the Spark job of similarity search in streaming time series is accomplished quickly and accurately. The subsequences of streaming time series, which are similar to predefined queries, are found in near real time. They are the same as those obtained from the execution of similarity search in streaming time series by another reference system. Furthermore, the prototype system has high scalability, stably works while processing time-series streams of high steady rate. These experimental results also underline the value of the combination of Spark Streaming and SUCR-DTW to handle the challenging problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available because the data are parts of the results of the CS2020-19 project funded by SGU. All results generated by the project are managed by and belong to the funder.

References

  1. The Apache Software Foundation (2018) Spark streaming. https://spark.apache.org/streaming/. Accessed 01 June 2020

  2. Zhang X, Qian Z, Shen S, Shi J, Wang S (2019) Streaming massive electric power data analysis based on Spark Streaming. In: Proceedings of international conference on database systems for advanced applications, pp 200–212, DOI https://doi.org/10.1007/978-3-030-18590-9_14

  3. Paolis D, Tommaso L, Luca VD, Paiano R (2018) Sensor data collection and analytics with thingsboard and spark streaming. In: Proceedings of 2018 IEEE workshop on environmental, energy, and structural monitoring systems (EESMS), pp 1–6, DOI https://doi.org/10.1109/EESMS.2018.8405822, (to appear in print)

  4. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, USA, pp 359–370

  5. Giao B C, Anh D T (2016) Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization. Vietnam J Comput Sci 3(3):181–196. https://doi.org/10.1007/s40595-016-0062-4

    Article  Google Scholar 

  6. Luo W, Li Y, Yao F, Wang S, Li Z, Zhan P, Li X (2021) Multi-resolution representation for streaming time series retrieval. Int J Pattern Recog Artif Intell 35(06):2150019. https://doi.org/10.1142/S0218001421500191

    Article  Google Scholar 

  7. Zhan P, Sun C, Hu Y, Luo W, Zheng J, Li X (2020) Feature-based online representation algorithm for streaming time series similarity search. Int J Pattern Recog Artif Intell 34(05):2050010. https://doi.org/10.1142/S021800142050010X

    Article  Google Scholar 

  8. Keogh E, Smyth P (1997) A probabilistic approach to fast pattern matching in time. In: Proceedings of third international conference knowledge discovery and data mining, vol 97. AAAI Press, 1997, California, USA, pp 24–30

  9. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–162, DOI https://doi.org/10.1145/375663.375680

  10. Aggarwal CC, Philip SY, Han J, Wang J (2003) A framework for clustering evolving data streams. In: Proceedings of 2003 VLDB Conference, pp 81–92, DOI https://doi.org/10.1016/B978-012722442-8/50016-1

  11. Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Seri C (Appl Stat) 28(1):100–108. https://doi.org/10.2307/2346830

    Article  MATH  Google Scholar 

  12. Ziehn A, Charfuelan M, Hemsen H, Markl V (2019) Time series similarity search for streaming data in distributed systems. In: Workshops of the EDBT/ICDT 2019 Joint Conference (EDBT/ICDT 2019), Lisbon, Portugal

  13. The Apache Software Foundation (2014) Apache Flink. https://flink.apache.org/. Accessed 01 Sept 2021

  14. Ding Y, Luo W, Zhao Y, Li Z, Zhan P, Li X (2019) A novel similarity search approach for streaming time series. J Phys Conf Ser 1302(2):022084. https://doi.org/10.1088/1742-6596/1302/2/022084

    Article  Google Scholar 

  15. Oregi I, Péres A, Ser DJ, Lozano JA (2017) On-line Dynamic Time Warping for streaming time series. In: Joint european conference on machine learning and knowledge discovery in databases, pp 591–605, DOI https://doi.org/10.1007/978-3-319-71246-8_36

  16. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sign Process 26(1):43–49. https://doi.org/10.1109/TASSP.1978.1163055

    Article  MATH  Google Scholar 

  17. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under Dynamic Time Warping. In: Proceedings of The 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’12), pp 262–270, DOI https://doi.org/10.1145/2339530.2339576

  18. The Apache Software Foundation (2018) Apache Spark. https://spark.apache.org/. Accessed 01 June 2020

  19. The Apache Software Foundation (2008) Apache YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 01 Sept 2020

  20. The Apache Software Foundation (2012) Apache Mesos. http://mesos.apache.org/. Accessed 01 Sept 2020

  21. The Apache Software Foundation (2006) Apache Hadoop. https://hadoop.apache.org/. Accessed 01 Sept 2020

  22. The Apache Software Foundation (2009) Apache Flume. https://flume.apache.org/. Accessed 01 Sept 2020

  23. The Apache Software Foundation (2017) Apache Kafka. https://kafka.apache.org/. Accessed 01 Sept 2020

  24. Gupta G (2015) Learning real-time processing with Spark Streaming. Packt Publishing Ltd, Birmingham B3 2PB, UK

    Google Scholar 

  25. The Apache Software Foundation (2004) Apache Derby. https://db.apache.org/derby/. Accessed 01 Sept 2020

  26. West M (2021) Time-series data. http://www2.stat.duke.edu/~mw/mwsoftware/moredata/ts_data. Accessed 01 Sept 2021

  27. Weigend AS (2016) SantaFe Time Series. http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Accessed Dec 2016

  28. Group MP (2016) Datasets relate to the operation of the electricity market. http://ftp.emi.ea.govt.nz/Datasets/. Accessed Dec 2016

Download references

Funding

This research is funded by Saigon University (SGU) under grant number CS2020-19.

Author information

Authors and Affiliations

Authors

Contributions

Bui Cong Giao mainly wrote the paper, implemented and experimented the framework. Phan Cong Vinh contributed in framework design and paper proofread.

Corresponding author

Correspondence to Bui Cong Giao.

Ethics declarations

Ethics approval

Not Applicable.

Conflict of interests

The authors declare that there are no conflicts of interest regarding the publication of this paper

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Giao, B.C., Vinh, P.C. A Framework for Similarity Search in Streaming Time Series based on Spark Streaming. Mobile Netw Appl 27, 2084–2097 (2022). https://doi.org/10.1007/s11036-022-01988-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-022-01988-6

Keywords

Navigation