Skip to main content
Log in

FastEE: Fast Ensembles of Elastic Distances for time series classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based classifiers. The time complexity of HIVE-COTE—particularly for training—is prohibitive for most datasets. There is thus a critical need to speed up the classifiers that compose HIVE-COTE. This paper focuses on speeding up one of its components: Ensembles of Elastic Distances (EE), which is the classifier that leverages on the decades of research into the development of time-dedicated measures. Training EE can be prohibitive for many datasets. For example, it takes a month on the ElectricDevices dataset with 9000 instances. This is because EE needs to cross-validate the hyper-parameters used for the 11 similarity measures it encompasses. In this work, Fast Ensembles of Elastic Distances is proposed to train EE faster. There are two versions to it. The exact version makes it possible to train EE 10 times faster. The approximate version is 40 times faster than EE without significantly impacting the classification accuracy. This translates to being able to train EE on ElectricDevices in 13 h.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. technical report# cmp-c14-01. Department of Computing Sciences, University of East Anglia, Technical Report

  • Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535

    Article  Google Scholar 

  • Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660

    Article  MathSciNet  Google Scholar 

  • Boreczky JS, Rowe LA (1996) Comparison of video shot boundary detection techniques. J Electron Imaging 5(2):122–129

    Article  Google Scholar 

  • Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: Proceedings of the 30th international conference on very large databases (VLDB), pp 792–803

    Chapter  Google Scholar 

  • Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data (SIGMOD), pp 491–502

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/

  • Dau H, Silva D, Petitjean F, Bagnall A, Keogh E (2017) Judicious setting of dynamic time warping’s window width allows more accurate classification of time series. In: Proceedings of the 2017 IEEE international conference on big data (Big Data), pp 917–922

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the 34th international conference on very large data bases (VLDB), pp 1542–1552

    Article  Google Scholar 

  • Flynn M, Large J, Bagnall T (2019) The contract random interval spectral ensemble (c-RISE): the effect of contracting a classifier on accuracy. In: Proceedings of 2019 international conference on hybrid artificial intelligence systems (HAIS), pp 381–392

    Google Scholar 

  • Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881

    Article  MathSciNet  Google Scholar 

  • Inglada J, Arias M, Tardy B, Hagolle O, Valero S, Morin D, Dedieu G, Sepulcre G, Bontemps S, Defourny P, Koetz B (2015) Assessment of an operational system for crop type map production using high temporal and spatial resolution satellite optical imagery. Remote Sens 7(9):12356–12379

    Article  Google Scholar 

  • Inglada J, Vincent A, Arias M, Marais-Sicre C (2016) Improved early crop type identification by joint use of high temporal resolution sar and optical image time series. Remote Sens 8(5):362

    Article  Google Scholar 

  • Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72

    Article  Google Scholar 

  • Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9):2231–2240

    Article  Google Scholar 

  • Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386

    Article  Google Scholar 

  • Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 2001 SIAM international conference on data mining (SDM), pp 1–11

  • Kim SW, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 607–614

  • Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn 42(9):2169–2180

    Article  Google Scholar 

  • Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592

    Article  MathSciNet  Google Scholar 

  • Lines J, Taylor S, Bagnall A (2016) HIVE-COTE: The hierarchical vote collective of transformation-based ensembles for time series classification. In: Proceedings of the 16th IEEE international conference on data mining (ICDM), pp 1041–1046

  • Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635

    Article  Google Scholar 

  • Marteau PF (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318

    Article  Google Scholar 

  • Petitjean F, Inglada J, Gançarski P (2012) Satellite image time series analysis under time warping. IEEE Trans Geosci Remote Sens 50(8):3081–3095

    Article  Google Scholar 

  • Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic time warping averaging of time series allows faster and more accurate classification. In: Proceedings of the 2014 IEEE international conference on data mining (ICDM), pp 470–479

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 262–270

  • Ratanamahatana C, Keogh E (2005) Three myths about DTW data mining. In: Proceedings of the 2005 SIAM international conference on data mining (SDM), pp 506–510

  • Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of the 2004 SIAM international conference on data mining, pp 11–22

  • Sakoe H, Chiba S (1971) A dynamic programming approach to continuous speech recognition. In: Proceedings of the 7th international congress on acoustics, Budapest, Hungary, vol 3, pp 65–69

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49

    Article  Google Scholar 

  • Shen Y, Chen Y, Keogh E, Jin H (2018) Accelerating time series searching with large uniform scaling. In: Proceedings of the 2018 SIAM international conference on data mining (SDM), pp 234–242

    Chapter  Google Scholar 

  • Silva D, Batista G (2016) Speeding up all-pairwise dynamic time warping matrix calculation. In: Proceedings of the 2016 SIAM international conference on data mining (SDM), pp 837–845

  • Srikanthan S, Kumar A, Gupta R (2011) Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. In: Proceedings of the 2nd international conference on computer and communication technology (ICCCT), pp 394–398

  • Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438

    Article  Google Scholar 

  • Tan CW, Webb GI, Petitjean F (2017) Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM international conference on data mining (SDM), pp 282–290

    Chapter  Google Scholar 

  • Tan CW, Herrmann M, Forestier G, Webb GI, Petitjean F (2018) Efficient search of the best warping window for dynamic time warping. In: Proceedings of the 2018 SIAM international conference on data mining (SDM), pp 225–233

    Chapter  Google Scholar 

  • Tan CW, Petitjean F, Webb GI (2019) Elastic bands across the path: a new framework and methods to lower bound DTW. In: Proceedings of the 2019 SIAM international conference on data mining (SDM), pp 522–530

    Chapter  Google Scholar 

  • Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of the 18th international conference on data engineering (ICDE), pp 673–684

  • Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 216–225

  • Yi BK, Jagadish H, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering (ICDE), pp 201–208

Download references

Acknowledgements

This research was supported by the Australian Research Council under Grant DP190100017. François Petitjean is the recipient of an Australian Research Council Discovery Early Career Award (Project Number DE170100037) funded by the Australian Government. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-18-1-4030. The authors would like to acknowledge the use of the UCR Time Series Classification archive that is made publicly available for time series classification benchmarks. We also would like to acknowledge the use of the source code for Ensemble of Elastic Distances that is freely available at http://www.timeseriesclassification.com/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang Wei Tan.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Existing lower bounds for elastic distances

Appendix A: Existing lower bounds for elastic distances

1.1 A.1 DTW lower bounds

Being the most popular elastic distance, lower bound for DTW has been widely studied (Yi et al. 1998; Kim et al. 2001; Keogh and Ratanamahatana 2005; Lemire 2009; Shen et al. 2018). Note that DDTW is a variant of DTW, so the lower bounds for DTW are directly applicable to DDTW.

Fig. 16
figure 16

Illustration of aKim and bKeogh lower bound

The simplest and loosest DTW lower bound is the Kim lower bound (LB_Kim) described in Eq. 14 (Kim et al. 2001). LB_Kim uses the maximum differences of the maximum, minimum, first and last points of Q and C as the lower bound for DTW. With initialisation, LB_Kim can be computed very quickly with O(1) time. Although a looser lower bound, it is still effective in filtering out the obvious unpromising candidates. Figure 16a illustrates this lower bound.

$$\begin{aligned} \textsc {LB\_Kim}(Q,C) = \max {\left\{ \begin{array}{ll} |q_1 - c_1| \\ |q_L - c_L| \\ |\max (Q)- \max (C)| \\ |\min (Q) - \min (C)| \end{array}\right. } \end{aligned}$$
(14)

The Keogh lower bound (LB_Keogh) (Keogh and Ratanamahatana 2005) is arguably one of the most used lower bound for DTW due to its simplicity and medium-high tightness. First, it creates two envelopes encapsulating the candidate time series. The upper envelope (UE) is built by finding the maximum within a warping window \(r\) range, and the lower envelope (LE) is by finding the minimum, as shown in Eq. 15.

$$\begin{aligned} \begin{matrix} UE_i = \max (c_{i-r}:c_{i+r}) \\ LE_i = \min (c_{i-r}:c_{i+r}) \end{matrix} \end{aligned}$$
(15)

Then LB_Keogh distance of Q and C is the Euclidean distance of all points in Q that are outside of the envelope to the envelopes UE and LE, as described in Eq. 16. Figure 16b illustrates LB_Keogh, where the sum of the length of the green lines is the LB_Keogh distance.

$$\begin{aligned} \textsc {LB\_Keogh}{}(Q,C) = \sqrt{\sum _{i=1}^{L}{ {\left\{ \begin{array}{ll} (q_i-UE_i)^2 &{} \quad \text {if } q_i > UE_i\\ (q_i-LE_i)^2 &{}\quad \text {if } q_i < LE_i\\ 0 &{}\quad \text {otherwise} \end{array}\right. }}} \end{aligned}$$
(16)

There are more sophisticated lower bounds that are tighter than LB_Keogh but has higher computation overheads. The Improved lower bound (LB_Improved) (Lemire 2009) performs LB_Keogh in 2 passes. The first pass computes standard LB_Keogh(QC) on the query and the second pass computes LB_Keogh\((Q',C)\) on the projection of the query \(Q'\) onto the envelopes. The New lower bound (LB_New) (Shen et al. 2018) takes advantages of the boundary and continuity conditions for DTW warping path to create a tighter lower bound. The boundary condition requires that every warping path contains \((q_1,c_1)\) and \((q_L,c_L)\). The continuity condition ensures that every \(q_i\) is paired with at least one \(c_j'\), where \(j\in \lbrace \max (1,i-r)\ldots \min (L,i+r)\rbrace \). The authors (Shen et al. 2018) sorts the points in \(c_j'\) and do a binary search if \(q_i\) is within the maximum and minimum of \(c_j'\).

1.2 A.2 ERP lower bounds

DTW lower bounds can be adapted for the ERP distance by taking into account the ERP’s penalty parameter g (Chen and Ng 2004). Equation 17 describes LB_Kim for ERP by considering that the first and last point may be a gap, where \(q'_1=q_1\) or g, \(q'_L=q_L\) or g, \(Q_{\max }'=\max (Q_{\max },g)\), \(Q_{\min }'=\min (Q_{\min },g)\). The same applies the candidate time series C.

$$\begin{aligned} \textsc {LB\_Kim}_{\textsc {ERP}{}}(Q,C) = \max {\left\{ \begin{array}{ll} |q_1' - c_1'| \\ |q_L' - c_L'| \\ |Q_{\max }'- C_{\max }'| \\ |Q_{\min }' - C_{\min }'| \end{array}\right. } \end{aligned}$$
(17)

Similarly to compute LB_Keogh for ERP (\(\textsc {LB\_Keogh}_\textsc {ERP}\)), the envelopes need to be adjusted for g where the maximum and minimum values have to include the g parameter. Equation 18 describes these new envelopes. Note that \(\texttt {bandsize}\) is used instead of \(r\). Then LB_Keogh for ERP is computed exactly the same way as LB_Keogh for DTW using Eq. 16 by substituting with the ERP envelopes.

$$\begin{aligned} \begin{matrix} UE'_i = \max (g, \max (c_{i-\texttt {bandsize}}:c_{i+\texttt {bandsize}})) \\ LE'_i = \min (g, \min (c_{i-\texttt {bandsize}}:c_{i+\texttt {bandsize}})) \end{matrix} \end{aligned}$$
(18)

All the previous lower bounds were developed specifically for DTW. Thus, the authors (Chen and Ng 2004) develop LB_ERP, a new lower bound specifically for ERP. By setting \(g=0\), LB_ERP is defined in Eq. 19 as the absolute difference of the sum of both time series. The authors showed that LB_ERP has better pruning power than LB_Keogh\(_{\textsc {ERP}{}}\). Currently LB_ERP is only defined for \(g=0\) and there are no further proofs for \(g\ne 0\). Therefore, we will only be using the LB_Keogh version for ERP in our work.

$$\begin{aligned} \textsc {LB\_ERP}(Q,C)=\left| {\sum Q - \sum C}\right| \end{aligned}$$
(19)

1.3 A.3 LCSS lower bound

The core of LCSS is based on the length of the longest common subsequence between two time series. Then the distance is the percentage of points that are not a match—having distance larger than \(\varepsilon \). Recall that LCSS also uses a local constraint \(\varDelta \), using similar idea as LB_Keogh by constructing an envelope around the candidate time series C, a lower bound function for LCSS distance has been proposed in (Vlachos et al. 2003). The envelope for Q is constructed using \(\varepsilon \) and \(\varDelta \) as described in Eq. 20.

$$\begin{aligned} \begin{array}{l} \mathbb {UE}_i = \max (c_{i-\varDelta }:c_{i+\varDelta }) + \varepsilon \\ \mathbb {LE}_i = \min (c_{i-\varDelta }:c_{i+\varDelta }) + \varepsilon \end{array} \end{aligned}$$
(20)

The sum of all \(q_i\in Q\) within the envelope creates an upper bound (UB) to the longest common subsequence. Then the lower bound distance for LCSS (LB_LCSS) is \(1-\text {UB}\), defined in Eq. 21, as the percentage of points that are not within the envelope.

$$\begin{aligned} \textsc {LB\_LCSS}{}(Q,C) = 1 - \frac{1}{L}\sum _{i=1}^{L}{ {\left\{ \begin{array}{ll} 1 &{} \quad \text {if } \mathbb {LE}_i \le q_i \le \mathbb {UE}_i\\ 0 &{}\quad \text {otherwise} \end{array}\right. }} \end{aligned}$$
(21)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, C.W., Petitjean, F. & Webb, G.I. FastEE: Fast Ensembles of Elastic Distances for time series classification. Data Min Knowl Disc 34, 231–272 (2020). https://doi.org/10.1007/s10618-019-00663-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00663-x

Keywords

Navigation