Scalable data series subsequence matching with ULISSE

Linardi, Michele; Palpanas, Themis

doi:10.1007/s00778-020-00619-4

Scalable data series subsequence matching with ULISSE

Regular Paper
Published: 04 July 2020

Volume 29, pages 1449–1474, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

580 Accesses
15 Citations
Explore all metrics

Abstract

Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is twofold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk-based index visits and in-memory sequential scans. Our approach supports non-Z-normalized and Z-normalized sequences and can be used with no changes with both Euclidean distance and dynamic time warping, for answering both k-NN and \(\epsilon \)-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive Time Series Subsequence Matching

Evolution of a Data Series Index

Fast data series indexing for in-memory data

Article 18 June 2021

Notes

If the dimension that imposes the ordering of the sequence is time, then we talk about time series. However, a series can also be defined over other measures (e.g., angle in radial profiles in astronomy, mass in mass spectroscopy in physics, etc.). We use the terms data series, time series and sequence interchangeably.
http://www.esa.int/Our_Activities/Observing_the_Earth/
http://www.airbus.com/
Z-normalization transforms a series so that it has a mean value of zero, and a standard deviation of one. This allows similarity search to be effective, irrespective of shifting (i.e., offset translation) and scaling [51].
A preliminary version of this work has appeared elsewhere [53, 54].

References

Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP, (1999)
Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 27(8), 2231–2244 (2015)
Article Google Scholar
Shasha, D.: Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)
Google Scholar
Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2014)
Article Google Scholar
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)
Article Google Scholar
ESA. SENTINEL-2 mission. https://sentinel.esa.int/web/sentinel/missions/sentinel-2
Zoumpatianos, K., Palpanas, T.: Data series management: Fulfilling the need for big sequence analytics. In: ICDE, (2018)
Palpanas, T., Beckmann, V.: Report on the first and second interdisciplinary time series analysis workshop (ITISA). SIGMOD Rec. 48(3), 36–40 (2019)
Article Google Scholar
Bagnall, A.J., Cole, R.L., Palpanas, T., Zoumpatianos, K.: Data series management. Dagstuhl Reports 9(7), 47–52 (2019)
Google Scholar
Niennattrakul, V., Ratanamahatana, C. A.: On clustering multimedia time series data using k-means and dynamic time warping. MUE ’07, (2007)
Lines, J., Bagnall, A.: Time series classification with ensembles of elastic distance measures. DAMI 29(3), 565–592 (2015)
MathSciNet MATH Google Scholar
Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: Time series anomaly discovery with grammar-based compression. In: EDBT, (2015)
Boniol, P., Linardi, M., Roncallo, F., Palpanas, T.: Automated Anomaly Detection in Large Sequences. In: ICDE, (2020)
Boniol, P., Palpanas, T.: Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series. PVLDB, (2020)
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD, (2014)
Palpanas, T.: Big sequence management: a glimpse of the past, the present, and the future. In: SOFSEM, (2016)
Palpanas, T.: The parallel and distributed future of data series mining. In: HPCS, (2017)
Gogolou, A., Tsandilas, T., Palpanas, T., Bezerianos, A.: Progressive similarity search on time series data. In: BigVis, in Conjunction with EDBT/ICDT, (2019)
Gogolou, A., Tsandilas, T., Echihabi, K., Bezerianos, A., Palpanas, T.: Data series progressive similarity search with probabilistic quality guarantees. In: SIGMOD (2020)
Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: The lernaean hydra of data series similarity search: an experimental evaluation of the state of the art. PVLDB 12(2), 112–127 (2018)
Google Scholar
Echihabi, K., Zoumpatianos, K., Palpanas, T., Benbrahim, H.: Return of the lernaean hydra: experimental evaluation of data series approximate similarity search. PVLDB 13(3), 403–420 (2019)
Google Scholar
Palpanas, T.: Evolution of a Data Series Index—The iSAX family of data series indexes. In: CCIS, (2020)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, (1994)
Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using dft. In: ICDE, (1998)
Keogh, E.J., Palpanas, T., Zordan, V.B., Gunopulos, D., Cardle, M.: Indexing large human-motion databases. In: VLDB, (2004)
Assent, I., Krieger, R., Afschari, F., Seidl, T.: The ts-tree: Efficient time series search and retrieval. In EDBT, (2008)
Shieh, J., Keogh, E.J.: isax: indexing and mining terabyte sized time series. In: KDD, pp. 623–631, (2008)
Kadiyala, S., Shiri, N.: A compact multi-resolution index for variable length queries in time series databases. KAIS 15(2), 131–147 (2008)
Google Scholar
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)
Google Scholar
Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.J.: Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, (2014)
Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 1(8), 13–24 (2014)
Google Scholar
Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1915 (2015)
Google Scholar
Zoumpatianos, K., Idreos, S., Palpanas, T.: ADS: the adaptive data series index. VLDB J. 25(6), 843–866 (2016)
Article Google Scholar
Yagoubi, D.E., Akbarinia, R., Masseglia, F., Palpanas, T.: Dpisax: Massively distributed partitioned isax. In: ICDM, (2017)
Yagoubi, D.-E., Akbarinia, R., Masseglia, F., Palpanas, T.: Massively distributed time series indexing and querying. TKDE 32(1), 108–120 (2020)
Google Scholar
Peng, B., Fatourou, P., Palpanas, T.: Paris: The next destination for fast data series indexing and query answering. In: IEEE Big Data, (2018)
Peng, B., Palpanas, T., Fatourou, P.: Paris+: Data series indexing on multi-core architectures. In: TKDE, (2020)
Peng, B., Palpanas, T., Fatourou, P.: Messi: In-memory data series indexing. In: ICDE, (2020)
Peng, Botao: (supervised by Panagiota Fatourou and Themis Palpanas). Data Series Indexing Gone Parallel. In ICDE PhD Workshop, (2020)
Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: a scalable bottom-up approach for building data series indexes. PVLDB 11(6), 677–690 (2018)
Google Scholar
Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut palm: Static and streaming data series exploration now in your palm. In: SIGMOD, (2019)
Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T.: Coconut: sortable summarizations for scalable indexes over static and streaming data series. VLDBJ 28(6), 847–869 (2019)
Article Google Scholar
Kahveci, T., Singh, A.: Variable length queries for time series data. In: ICDE, (2001)
Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G. E. A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J.: Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD, (2012)
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E. J.: Matrix profile X: VALMOD—scalable discovery of variable-length motifs in data series. In: SIGMOD Conference (2018)
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E. J.: VALMOD: A suite for easy and exact detection of variable length motifs in data series. In: SIGMOD Conference (2018)
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.J.: Matrix Profile Goes MAD: Variable-length motif and discord discovery in data series. In: DAMI, (2020)
Linardi, Michele: (supervised by Themis Palpanas). Effective and Efficient Variable-Length Data Series Analytics. In: VLDB PhD Workshop, (2019)
A.G.H. of Operational Intelligence Department Airbus. Personal communication., (2017)
Rosa, A.C., Parrino, L., Terzano, M.G.: Automatic detection of cyclic alternating pattern (cap) sequences in sleep: preliminary results. Clin. Neurophysiol. 110(4), 585–592 (1999)
Article Google Scholar
Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. DAMI 7(4), 349–371 (2003)
MathSciNet Google Scholar
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.J.: isax 2.0: Indexing and mining one billion time series. In: ICDM (2010)
Linardi, M., Palpanas, T.: Scalable, variable-length similarity search in data series: The ULISSE approach. PVLDB 11(13), 2236–2248 (2018)
Google Scholar
Linardi, M., Palpanas, T.: ULISSE: ULtra compact index for variable-length similarity SEarch in data series. In: ICDE (2018)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000)
MATH Google Scholar
Loh, W., Kim, S., Whang, K.: A subsequence matching algorithm that supports normalization transform in time-series databases. Data Min. Knowl. Discov. 9(1), 5–28 (2004)
Article MathSciNet Google Scholar
Han, W., Lee, J., Moon, Y., Jiang, H.: Ranked subsequence matching in time-series databases. In: VLDB, (2007)
Wu, J., Wang, P., Pan, N., Wang, C., Wang, W., Wang, J.: Kv-match: A subsequence matching approach supporting normalization and time warping. In: ICDE, (2019)
Mueen, A., Hamooni, H., Estrada, T.: Time series join on subsequence correlation. In: ICDM, (2014)
Kruskal, J., Liberman, M.: The symmetric time-warping problem: From continuous to discrete. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, 01 (1983)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 46–49 (1978)
Article MATH Google Scholar
Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal 23(1), 67–72 (1975)
Article Google Scholar
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. DAMI 15(2), 107–144 (2007)
MathSciNet Google Scholar
Keogh, E.J., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: SIGKDD, (2015)
http://www.mi.parisdescartes.fr/~mlinardi/ULISSE.html
Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB J. 27(6), 823–846 (2018)
Article Google Scholar
Lichman, M.: UCI machine learning repository, (2013)
Terzano, M.G., Parrino, L., Sherieri, A., Chervin, R., Chokroverty, S., Guilleminault, C., Hirshkowitz, M., Mahowald, M., Moldofsky, H., Rosa, A., Thomas, R., Walters, A.: Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (cap) in human sleep. Sleep Med. 2(6), 537–553 (2001)
Article Google Scholar
Healey JA, P.R.: Detecting stress during real-world driving tasks using physiological sensors. ITS 6(2), 156–166 (2016)
Google Scholar
Soldi, S., Beckmann, V., Baumgartner, W.H., Ponti, G., Shrader, C.R., Lubinski, P., Krimm, H.A., Mattana, F., Tueller, J.: Long-term variability of agn at hard x-rays. Astronomy Astrophys. 563, A57 (2014)
Article Google Scholar
IRIS. Seismic Data Access. http://ds.iris.edu/data/access, (2016)
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.J.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. DAMI, (2017)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DAMI 26(2), 275–309 (2013)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

LIPADE, Université de Paris, Paris, France
Michele Linardi & Themis Palpanas

Authors

Michele Linardi
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michele Linardi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Linardi, M., Palpanas, T. Scalable data series subsequence matching with ULISSE. The VLDB Journal 29, 1449–1474 (2020). https://doi.org/10.1007/s00778-020-00619-4

Download citation

Received: 09 June 2019
Revised: 31 January 2020
Accepted: 30 May 2020
Published: 04 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s00778-020-00619-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable data series subsequence matching with ULISSE

Abstract

Access this article

Similar content being viewed by others

Interactive Time Series Subsequence Matching

Evolution of a Data Series Index

Fast data series indexing for in-memory data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable data series subsequence matching with ULISSE

Abstract

Access this article

Similar content being viewed by others

Interactive Time Series Subsequence Matching

Evolution of a Data Series Index

Fast data series indexing for in-memory data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation