Skip to main content
Log in

Scalable data series subsequence matching with ULISSE

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is twofold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk-based index visits and in-memory sequential scans. Our approach supports non-Z-normalized and Z-normalized sequences and can be used with no changes with both Euclidean distance and dynamic time warping, for answering both k-NN and \(\epsilon \)-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. If the dimension that imposes the ordering of the sequence is time, then we talk about time series. However, a series can also be defined over other measures (e.g., angle in radial profiles in astronomy, mass in mass spectroscopy in physics, etc.). We use the terms data series, time series and sequence interchangeably.

  2. http://www.esa.int/Our_Activities/Observing_the_Earth/

  3. http://www.airbus.com/

  4. Z-normalization transforms a series so that it has a mean value of zero, and a standard deviation of one. This allows similarity search to be effective, irrespective of shifting (i.e., offset translation) and scaling [51].

  5. A preliminary version of this work has appeared elsewhere [53, 54].

References

  1. Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP, (1999)

  2. Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 27(8), 2231–2244 (2015)

    Article  Google Scholar 

  3. Shasha, D.: Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)

    Google Scholar 

  4. Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2014)

    Article  Google Scholar 

  5. Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)

    Article  Google Scholar 

  6. ESA. SENTINEL-2 mission. https://sentinel.esa.int/web/sentinel/missions/sentinel-2

  7. Zoumpatianos, K., Palpanas, T.: Data series management: Fulfilling the need for big sequence analytics. In: ICDE, (2018)

  8. Palpanas, T., Beckmann, V.: Report on the first and second interdisciplinary time series analysis workshop (ITISA). SIGMOD Rec. 48(3), 36–40 (2019)

    Article  Google Scholar 

  9. Bagnall, A.J., Cole, R.L., Palpanas, T., Zoumpatianos, K.: Data series management. Dagstuhl Reports 9(7), 47–52 (2019)

    Google Scholar 

  10. Niennattrakul, V., Ratanamahatana, C. A.: On clustering multimedia time series data using k-means and dynamic time warping. MUE ’07, (2007)

  11. Lines, J., Bagnall, A.: Time series classification with ensembles of elastic distance measures. DAMI 29(3), 565–592 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: Time series anomaly discovery with grammar-based compression. In: EDBT, (2015)

  13. Boniol, P., Linardi, M., Roncallo, F., Palpanas, T.: Automated Anomaly Detection in Large Sequences. In: ICDE, (2020)

  14. Boniol, P., Palpanas, T.: Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series. PVLDB, (2020)

  15. Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD, (2014)

  16. Palpanas, T.: Big sequence management: a glimpse of the past, the present, and the future. In: SOFSEM, (2016)

  17. Palpanas, T.: The parallel and distributed future of data series mining. In: HPCS, (2017)

  18. Gogolou, A., Tsandilas, T., Palpanas, T., Bezerianos, A.: Progressive similarity search on time series data. In: BigVis, in Conjunction with EDBT/ICDT, (2019)

  19. Gogolou, A., Tsandilas, T., Echihabi, K., Bezerianos, A., Palpanas, T.: Data series progressive similarity search with probabilistic quality guarantees. In: SIGMOD (2020)

  20. Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: The lernaean hydra of data series similarity search: an experimental evaluation of the state of the art. PVLDB 12(2), 112–127 (2018)

    Google Scholar 

  21. Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: Return of the lernaean hydra: experimental evaluation of data series approximate similarity search. PVLDB 13(3), 403–420 (2019)

    Google Scholar 

  22. Palpanas, T.: Evolution of a Data Series Index—The iSAX family of data series indexes. In: CCIS, (2020)

  23. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, (1994)

  24. Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using dft. In: ICDE, (1998)

  25. Keogh, E.J., Palpanas, T., Zordan, V.B., Gunopulos, D., Cardle, M.: Indexing large human-motion databases. In: VLDB, (2004)

  26. Assent, I., Krieger, R., Afschari, F., Seidl, T.: The ts-tree: Efficient time series search and retrieval. In EDBT, (2008)

  27. Shieh, J., Keogh, E.J.: isax: indexing and mining terabyte sized time series. In: KDD, pp. 623–631, (2008)

  28. Kadiyala, S., Shiri, N.: A compact multi-resolution index for variable length queries in time series databases. KAIS 15(2), 131–147 (2008)

    Google Scholar 

  29. Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)

    Google Scholar 

  30. Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.J.: Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, (2014)

  31. Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 1(8), 13–24 (2014)

    Google Scholar 

  32. Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1915 (2015)

    Google Scholar 

  33. Zoumpatianos, K., Idreos, S., Palpanas, T.: ADS: the adaptive data series index. VLDB J. 25(6), 843–866 (2016)

    Article  Google Scholar 

  34. Yagoubi, D.E., Akbarinia, R., Masseglia, F., Palpanas, T.: Dpisax: Massively distributed partitioned isax. In: ICDM, (2017)

  35. Yagoubi, D.-E., Akbarinia, R., Masseglia, F., Palpanas, T.: Massively distributed time series indexing and querying. TKDE 32(1), 108–120 (2020)

    Google Scholar 

  36. Peng, B., Fatourou, P., Palpanas, T.: Paris: The next destination for fast data series indexing and query answering. In: IEEE Big Data, (2018)

  37. Peng, B., Palpanas, T., Fatourou, P.: Paris+: Data series indexing on multi-core architectures. In: TKDE, (2020)

  38. Peng, B., Palpanas, T., Fatourou, P.: Messi: In-memory data series indexing. In: ICDE, (2020)

  39. Peng, Botao: (supervised by Panagiota Fatourou and Themis Palpanas). Data Series Indexing Gone Parallel. In ICDE PhD Workshop, (2020)

  40. Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: a scalable bottom-up approach for building data series indexes. PVLDB 11(6), 677–690 (2018)

    Google Scholar 

  41. Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut palm: Static and streaming data series exploration now in your palm. In: SIGMOD, (2019)

  42. Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: sortable summarizations for scalable indexes over static and streaming data series. VLDBJ 28(6), 847–869 (2019)

    Article  Google Scholar 

  43. Kahveci, T., Singh, A.: Variable length queries for time series data. In: ICDE, (2001)

  44. Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G. E. A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J.: Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD, (2012)

  45. Linardi, M., Zhu, Y., Palpanas, T., Keogh, E. J.: Matrix profile X: VALMOD—scalable discovery of variable-length motifs in data series. In: SIGMOD Conference (2018)

  46. Linardi, M., Zhu, Y., Palpanas, T., Keogh, E. J.: VALMOD: A suite for easy and exact detection of variable length motifs in data series. In: SIGMOD Conference (2018)

  47. Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.J.: Matrix Profile Goes MAD: Variable-length motif and discord discovery in data series. In: DAMI, (2020)

  48. Linardi, Michele: (supervised by Themis Palpanas). Effective and Efficient Variable-Length Data Series Analytics. In: VLDB PhD Workshop, (2019)

  49. A.G.H. of Operational Intelligence Department Airbus. Personal communication., (2017)

  50. Rosa, A.C., Parrino, L., Terzano, M.G.: Automatic detection of cyclic alternating pattern (cap) sequences in sleep: preliminary results. Clin. Neurophysiol. 110(4), 585–592 (1999)

    Article  Google Scholar 

  51. Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. DAMI 7(4), 349–371 (2003)

    MathSciNet  Google Scholar 

  52. Camerra, A., Palpanas, T., Shieh, J., Keogh, E.J.: isax 2.0: Indexing and mining one billion time series. In: ICDM (2010)

  53. Linardi, M., Palpanas, T.: Scalable, variable-length similarity search in data series: The ULISSE approach. PVLDB 11(13), 2236–2248 (2018)

    Google Scholar 

  54. Linardi, M., Palpanas, T.: ULISSE: ULtra compact index for variable-length similarity SEarch in data series. In: ICDE (2018)

  55. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000)

    MATH  Google Scholar 

  56. Loh, W., Kim, S., Whang, K.: A subsequence matching algorithm that supports normalization transform in time-series databases. Data Min. Knowl. Discov. 9(1), 5–28 (2004)

    Article  MathSciNet  Google Scholar 

  57. Han, W., Lee, J., Moon, Y., Jiang, H.: Ranked subsequence matching in time-series databases. In: VLDB, (2007)

  58. Wu, J., Wang, P., Pan, N., Wang, C., Wang, W., Wang, J.: Kv-match: A subsequence matching approach supporting normalization and time warping. In: ICDE, (2019)

  59. Mueen, A., Hamooni, H., Estrada, T.: Time series join on subsequence correlation. In: ICDM, (2014)

  60. Kruskal, J., Liberman, M.: The symmetric time-warping problem: From continuous to discrete. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, 01 (1983)

  61. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 46–49 (1978)

    Article  MATH  Google Scholar 

  62. Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal 23(1), 67–72 (1975)

    Article  Google Scholar 

  63. Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. DAMI 15(2), 107–144 (2007)

    MathSciNet  Google Scholar 

  64. Keogh, E.J., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  65. Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: SIGKDD, (2015)

  66. http://www.mi.parisdescartes.fr/~mlinardi/ULISSE.html

  67. Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB J. 27(6), 823–846 (2018)

    Article  Google Scholar 

  68. Lichman, M.: UCI machine learning repository, (2013)

  69. Terzano, M.G., Parrino, L., Sherieri, A., Chervin, R., Chokroverty, S., Guilleminault, C., Hirshkowitz, M., Mahowald, M., Moldofsky, H., Rosa, A., Thomas, R., Walters, A.: Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (cap) in human sleep. Sleep Med. 2(6), 537–553 (2001)

    Article  Google Scholar 

  70. Healey JA, P.R.: Detecting stress during real-world driving tasks using physiological sensors. ITS 6(2), 156–166 (2016)

    Google Scholar 

  71. Soldi, S., Beckmann, V., Baumgartner, W.H., Ponti, G., Shrader, C.R., Lubinski, P., Krimm, H.A., Mattana, F., Tueller, J.: Long-term variability of agn at hard x-rays. Astronomy Astrophys. 563, A57 (2014)

    Article  Google Scholar 

  72. IRIS. Seismic Data Access. http://ds.iris.edu/data/access, (2016)

  73. Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.J.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. DAMI, (2017)

  74. Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DAMI 26(2), 275–309 (2013)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Linardi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Linardi, M., Palpanas, T. Scalable data series subsequence matching with ULISSE. The VLDB Journal 29, 1449–1474 (2020). https://doi.org/10.1007/s00778-020-00619-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00619-4

Keywords

Navigation