Shapelet Discovery by Lazy Time Series Classification

Zhang, Wei; Wang, Zhihai; Yuan, Jidong; Hao, Shilei

doi:https://doi.org/10.1155/2020/1978310

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Notation Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 1978310 | https://doi.org/10.1155/2020/1978310

Shapelet Discovery by Lazy Time Series Classification

Wei Zhang,¹Zhihai Wang,¹Jidong Yuan,¹and Shilei Hao¹

Academic Editor: José Alfredo Hernández-Pérez

Received16 Jan 2020

Revised15 Sept 2020

Accepted21 Sept 2020

Published26 Oct 2020

Abstract

As a representation of discriminative features, the time series shapelet has recently received considerable research interest. However, most shapelet-based classification models evaluate the differential ability of the shapelet on the whole training dataset, neglecting characteristic information contained in each instance to be classified and the classwise feature frequency information. Hence, the computational complexity of feature extraction is high, and the interpretability is inadequate. To this end, the efficiency of shapelet discovery is improved through a lazy strategy fusing global and local similarities. In the prediction process, the strategy learns a specific evaluation dataset for each instance, and then the captured characteristics are directly used to progressively reduce the uncertainty of the predicted class label. Moreover, a shapelet coverage score is defined to calculate the discriminability of each time stamp for different classes. The experimental results show that the proposed method is competitive with the benchmark methods and provides insight into the discriminative features of each time series and each type in the data.

1. Introduction

In recent years, massive time series data have been generated in many fields, including weather forecasting [1], malware detection [2], voltage stability assessment [3], human identification [4], and biomedicine [5]. Hence, the study of time series has been widely applicable, among which classification is an important research field. The classification issue of time series is the same as the traditional classification problem. We hope to find a function that can map any time series to a target class label. Although a large number of time series classification algorithms have been proposed, extensive experiments show that the 1NN classifier combining different distance metrics is still a competitive model in many problem areas [6–12]. In addition to the common Euclidean distance, alternatives have been proposed to measure the similarity between time series, including dynamic time warping (DTW) [13], weighted DTW (WDTW) [14], edit distance with real penalty (ERP) [15], time warp edit (TWE) [16], and move-split-merge (MSM) [17].

The improved distance function can help promote the performance of the nearest-neighbor model, but the 1NN classifier presents obvious drawbacks. This classifier cannot indicate the common characteristics of similar instances and the dissimilarity between different classes. In other words, its interpretability is insufficient. In reality, except for the accuracy, the features of distinct instances are our concern. These features provide a deeper understanding of data and improve the interpretability of the classification model. Unfortunately, time series usually have no definite features. Hence, various feature prototypes are proposed to mine potential patterns of time series [9, 18, 19]. Among them, the most classic is the local discriminative features “shapelet” [18].

A shapelet (as shown in Figure 1(a)) is a special discriminatory subsequence of time series, which is originally applied to construct the shapelet-based decision tree (SDT) [18] through recursively searching for the best shapelet in the training set. Since the shapelet can be used to establish an interpretable classification model, it has been widely studied [9]. Shapelet-based classification models can be divided into two categories. One type of the method utilizes the top-k shapelets to create a transformed dataset, on which the traditional classification algorithms [20–24] could be applied. The other uses the shapelets to build the classification model directly [18, 25–28].

(a)

(b)

(c)

Figure 1

(a) The shapelet found by SDT for the ItalyPowerDemand dataset, and its corresponding training instances. The black bold part indicates the discovered time series shapelet. (b) Three time series from the ItalyPowerDemand training dataset and their corresponding shapelets captured by LSCR. and (in blue and red, respectively) are instances with different classes, while and (in blue) are instances of the same type Class2. Based on the discovered shapelets, our model can make correct predictions for the three instances. (c) The classwise shapelet coverage score obtained on the training set for each sampling point.

One major problem that all shapelet-based approaches generally face is the massive size of the candidate shapelet set. To solve this problem, researchers have put forward several methods. These methods can be roughly divided into four categories: (1) training instances are selected to generate the candidate shapelets. For example, Ji et al. [29] put forward a subclass splitting method to sample the training instances for candidate shapelet generation. (2) Heuristic shapelet search method. Grabocka et al. [27] presented a heuristic gradient descent shapelet search algorithm, which created a smaller candidate shapelet set. Rakthanmanon et al. [26] proposed a fast shapelet discovery algorithm based on Symbolic Aggregate approXimation (SAX). Similarly, Fang et al. [28] introduced a novel method to search shapelets based on piecewise aggregate approximation (PAA). (3) A random selection mechanism is used to select shapelets. Renard et al. [30] first proposed a random-shapelet algorithm to build the decision trees. Karlsson et al. [31] constructed the shapelet-based random forest, in which each decision tree is built based on instances and shapelets selected randomly. Further, to omit the shapelet threshold search, Shi et al. [32] have put forward the random pairwise shapelet forest. (4) Reformulating the shapelet search problem into a numerical optimization problem. For example, Hou et al. [33] treated the shapelet search task as a numerical optimization problem, and then the shapelets were learned by numerical analysis methods. Likewise, Wang et al. [34] designed a semisupervised shapelet learning model, which transforms the feature search problem into a joint optimization problem. Ma et al. [35] proposed an end-to-end model to learn the most discriminative shapelets by the gradient descent method. Zhao et al. [36] recently proposed a regularized shapelet learning framework to improve the shapelet learning efficiency. Although the above methods improve the classification efficiency of shapelet-based models to some extent, the vast majority of shapelet-based global classification models still has the following disadvantages:(1)The shapelets captured by most shapelet-based models cannot adequately reflect the information of feature distribution and frequency of each class in the dataset. For example, due to the existence of intraclass variation, a few instances in different classes may have low-frequency discriminative features.(2)The whole training set is generally applied for the discriminatory evaluation of candidate shapelets. Owing to the influence of the redundant instances and the intraclass variability, the extracted shapelets are merely the best on average for instances in the training dataset and cannot accurately reflect the local characteristics of the instance to be classified. In other words, the established shapelet-based model is not suitable or efficient for each test instance. Targeted evaluation strategies are not given enough attention.

To address these problems, we have first proposed a lazy shapelet-based model to capture the local features of each instance in the literature [37] (the literature [37] is a poster in the conference ICONIP 2018. In the conference manuscript, we simply proposed a lazy model to classify the instance based on its own local features. However, in this paper, the extended version further studies the fusion of global and local similarity, the local feature distribution and frequency information discovery, etc. In addition, this paper includes details about how to discover the local features for each instance to be classified in the shapelet-based model, how to use the local characteristics of each instance to determine the classwise discriminatory information, more experiments on parameter setting, statistical analysis, model comparison, and case studies). However, the proposed model still cannot get insight into the feature distribution and frequency information of the time series data. For that, we significantly extend the research on the data-driven, shapelet-based model (lazy shapelet classification route, LSCR) to study the feature distribution and frequency information discovery. Here, the advantages of our model are interpreted in conjunction with Figure 1. From Figure 1(b), it can be found that the heterogeneous instances are different, and that there are also differences between the homogeneous instances. Since LSCR performs targeted analysis for each instance, compared with the SDT shapelet (as shown in Figure 1(a)), our model may capture characteristics that SDT cannot discover. For example, the shapelet does not appear in the model built by SDT. Further, to evaluate the classwise discriminative feature frequency, the shapelet coverage score is defined. From Figure 1(c), we find that the scores can not only indicate the local discriminant intervals for different classes but also reflect their frequency information. For instance, the low-frequency discriminative interval [0, 7] reflects the location of the local features detected by our model on a few instances of Class2, which are usually caused by intraclass variation and ignored by the global shapelet-based model. The main contributions of this paper are summarized as follows:(1)In contrast with the classical kNN or 1NN model based on global similarity, our model is a fusion of global and local similarities. For the consideration of global similarity, the instance selection strategy is used in evaluating the discrimination of shapelets. The smaller evaluation dataset can eliminate the interference of intraclass variation and improve the classification performance. In addition, local similarity is applied instead of global similarity for prediction, which makes the proposed model more interpretable.(2)To reduce the massive number of redundant candidate shapelets generated by the brute-force algorithm, a novel strategy is proposed for extracting candidate shapelets from the instance to be classified. This strategy can guarantee that the extracted shapelets accurately reflect the local characteristics of each test instance.(3)The shapelet coverage score of each sampling point is calculated to analyze the local characteristics of different classes in the dataset. Since the proposed model can efficiently analyze the local features of each instance, more accurate local characteristic information can be obtained. In particular, the classwise discriminative feature frequency and distribution information can be presented, which can help us to understand the data more comprehensively.

The remainder of the paper is organized as follows. Section 2 introduces related concepts and basic theories. Section 3 describes the proposed model and algorithm design in detail. Section 4 presents the experimental analysis. Section 5 offers the conclusion of this paper.

2. Definitions and Notation

In this section, some definitions and formulas related to our model will be presented.

Definition 1. (time series). A time series T is an ordered sequence that contains m actual observation values , and , i.e., , . The symbol represents the dataset containing n time series.

Definition 2. (time series subsequence and shapelet). Given a time series , a subsequence S of T contains l consecutive values from T; that is, , where 1 i m − l + 1. A shapelet is a tuple (S, ) that consists of a subsequence S and a distance threshold .

Definition 3. (candidate shapelet set). The candidate shapelet set is composed of subsequences of time series.
The symbol is a dataset corresponding to an arbitrary tree node in the model SDT or LSCR, and is a set of candidate shapelets with length l built on the dataset . in SDT could be represented as follows:where denotes the set of subsequences with length l from .
However, in our work, since the best shapelets are searched from the subsequence space of the instance to be classified, the set of shapelet candidates of length l for each node in LSCR iswhere T is the test instance.
Then, the whole candidate shapelet sets for each node in SDT and LSCR can be obtained through the following equation:where and are the minimum and maximum candidate lengths, respectively.
Therefore, compared with SDT, LSCR reduces the scale of candidate shapelets in a single node by an order of magnitude.

Definition 4. (similarity of equal-length time series). Let be a similarity function of time series, which takes and with equal length as the input. The function will return a nonnegative value, which represents the similarity degree.
Generally, the smaller the distance is between two time series, the more similar the two time series are. In reality, we often need to judge the similarity between unequal time series. For example, in our work, we need to determine whether a time series contains a specific local feature through the distance between a subsequence and a whole time series.

Definition 5. (similarity of unequal-length time series). Let be a similarity function, in which time series T and S have different lengths. The function returns a nonnegative optimal matching distance between two sequences as the degree of similarity. The distance between time series T and sequence S iswhere the symbol represents the subsequence set with length of time series T ( and denote the length of sequences S and T, respectively, and .
In our model, the distance between subsequences with equal length is calculated by the Euclidean distance, while the distance between complete time series is measured by the specified distance function.
In reality, different subsequences may have disparate discriminability. In our work, information gain is used to measure the discrimination of shapelets. To reduce distance computation and improve the shapelet discriminant property, the concept of the evaluation dataset is put forward.

Definition 6. (shapelet evaluation dataset). The shapelet evaluation dataset is a specific subset of the training dataset, which is designed to evaluate the discriminability of local features of each test case.

Definition 7. (entropy of dataset). The entropy of a given dataset is calculated by the following formula:where is an element in the class value set C of , is the subset of instances with class in , and the proportion is calculated by

Definition 8 (shapelet information gain). Given a shapelet S and a dataset containing instances with different classes, the information gain of S is calculated as follows:where s is a split distance that can be applied to divide the dataset into two subsets: and .
Finally, the maximum value of information gain is normally treated as the shapelet discrimination, and the corresponding distance is taken as threshold . The split distance s usually takes the middle distance between two distance points. The detailed calculation process can be found in the literature [18]. In our model, the shapelet information gain calculated by equation (7) on a specific evaluation dataset reflects the reduction in uncertainty in the predicted class label of the test instance. Here, the mathematical description of our model is introduced.
Given a specific test instance T, its uncertainty of predicted class in our model iswhere is the initial evaluation dataset of T.
In the prediction process, the proposed model LSCR tries to progressively reduce the uncertainty by its own characteristics. The model can be formulated aswhere denotes the ith element in the learned shapelet set S of T.
Additionally, is the corresponding evaluation dataset of , which is determined byFinally, the main class property of the dataset would be taken as the predicted class value of T. Generally, there will be only one type of instance left in the final evaluation dataset.

3. Targeted Shapelet Extraction Technique

3.1. Overview

To provide a brief introduction to the model LSCR, a schematic diagram is first presented in Figure 2. As determined from the figure, the evaluation dataset for the test instance T is first generated; second, the candidate shapelets from T are evaluated on , and then the best shapelet is employed to exclude the instances that do not contain the local feature represented by ; third, the optimal shapelet on the dataset is continually searched until termination. Generally, there will be only one class of instances left in the final dataset, and the class will be taken as the predicted value.

The predicted result may be different from the class value of the nearest neighbor or the majority class in the initial dataset. This is the greatest difference between our model and the nearest-neighbor models. Here, the model will be described in detail.

3.2. Building Shapelet Evaluation Dataset

From the perspective of information theory, the purpose of extracting the best shapelet is to minimize the uncertainty of the class label of the test instance. The uncertainty is reflected in the class distribution of each subdataset generated based on whether the instance contains the specific feature. Hence, the more unbalanced the distribution of the subset of instances selected based on the shapelet, the more discriminative the feature. For example, for a binary classification problem, it is ideal to use the shapelet to divide the dataset containing instances with different class values into two subsets, each of which contains only one type of instance.

For the large-scale dataset, the running time of the searching shapelet is unbearable, so we attempt to sample the training instances for shapelet evaluation. In this paper, instances are selected based on the neighbor distance and the class value of the closest neighbor. Moreover, if the nearest-neighbor instances in the neighborhoods corresponding to the initial node of the classification path belong to the same class, then the route degrades into a single node. In particular, when the size of neighborhoods is set to 1, the model degenerates into 1NN, and the discriminant feature cannot be extracted effectively. In view of these problems, we propose to build a small targeted subset that contains instances with different classes for the instance to be classified. The subset ensures that the distinguishing nature of the local features can be evaluated. In addition, the data sampling strategy can eliminate the impact of intraclass variation on the local feature discriminant evaluation.

As shown in Algorithm 1, according to the class value of the nearest-neighbor instance, we select k identical and different instances for the test instance to construct a shapelet evaluation dataset (lines 2–6).

	Input: training dataset: D; test instance: T; the number of homogeneous and heterogeneous instances: k; the distance function used to calculate the similarity between complete time series: dist.
	Output: the shapelet evaluation dataset: shapeletEvaluationData.
(1)	shapeletEvaluationData
(2)	double c getNearestNeighborClass (D, T)
(3)	homogeneityData getTopKHomogeneityNearestNeighbors(D, T, k, c)
(4)	heterogeneousData getTopKHeterogeneousNearestNeighbors (D, T, k, c)
(5)	shapeletEvaluationData homogeneityData heterogeneousData
(6)	return shapeletEvaluationData

3.3. Finding the Optimal Shapelet

To reduce the computational complexity of extracting the best shapelet and make the extracted shapelets better reflect the characteristics of the test instance, a data-driven shapelet search algorithm is further proposed to find the best shapelet. We only search the best shapelets from the subsequence space of the instance to be classified so that the extracted shapelet accurately reflects the local features of each test instance. The process of generating the candidate shapelets collection for the test instance is given in Algorithm 2. In the algorithm, each subsequence of T with starting point i and length j constitutes the candidate shapelet set (line 4).

	Input: test instance: T; the minimum and maximum length of the shapelet: min and max.
	Output: the candidate shapelets set: CandidatesSet.
(1)	CandidatesSet
(2)	for each to do
(3)	for each = min to max do
(4)	S generateCandidate (T, i, j)
(5)	CandidatesSet.add (S)
(6)	end for
(7)	end for
(8)	return CandidatesSet

The candidate set corresponding to each node in the model SDT contains O (nm²) candidate shapelets, where n is the number of time series and m is the length of each time series. In our model, we only consider the subsequences of test instance T, so there are only O (m²) candidate shapelets to be evaluated for each node. Our work reduces the size of candidate shapelet collection of each node by one order of magnitude.

The purpose of extracting shapelets from time series is to classify the time series using the discriminator. The discriminability of shapelets provides us with a way to explain the classification results. Algorithm 3 introduces the method for finding the best shapelet for the instance to be classified on the evaluation dataset. First, the candidate shapelet set is generated for T (line 3). Then, each candidate is evaluated by equation (7) to search for the best shapelet (lines 4–8).

	Input: test instance: T; shapelet evaluation dataset for T: ; the minimum and maximum length of shapelet: min and max.
	Output: the best shapelet of the test case: bestShapelet.
(1)	bestShapelet null
(2)	double bestGain 0
(3)	CandidateSet GenerateCandidatesForT(T, min, max)
(4)	for each do
(5)	double gain assessCandidate (, S)
(6)	if gain bestGain then
(7)	bestGain gain
(8)	bestShapelet S
(9)	end if
(10)	end for
(11)	return bestShapelet

The time complexity of our model to find the optimal feature is O (km⁴), while that of the brute-force search algorithm in SDT is O (n²m⁴). Considering that k is smaller than n, the calculation in the progress of searching for the best shapelet is significantly less. In addition, two ways to improve the efficiency of shapelet search are used [18, 25]: the early abandonment mechanism and the shapelet entropy pruning strategy.

3.4. Lazy Shapelet Classification Algorithm

A classification route based on shapelets for each instance to be classified is built through Algorithm 4.

	Input: training dataset: D; the test instance: T; the distance function used to calculate the similarity between complete time series: dist.
	Output: the classification route for T: CRForT.
(1)	Build the initial evaluation dataset for T at the root node by Algorithm 1
(2)	Generate the candidate shapelet set for T by Algorithm 2
(3)	Evaluate each candidate shapelet in based on using Algorithm 3 and search for the best shapelet S.
(4)	if no discriminatory shapelet S can be found in then
(5)	return CRForT and the majority class c in is taken as the predictive value
(6)	else
(7)	Update the evaluation dataset to exclude instances that do not contain the feature S
(8)	Build the child node
(9)	end if
(10)	repeat steps 2 to 9 until the end
(11)	return CRForT

Algorithm 4 mainly consists of five steps. First, the targeted shapelet evaluation dataset is established for T (line 1). Second, the candidate shapelet set is generated for T (line 2). Third, the model searches for the best shapelet on the evaluation dataset (line 3) and judges whether the termination condition is satisfied (line 4). It does not meet the termination condition at the initial node; that is, it will not degrade to the single node route. Generally, a best shapelet S is found. Fourth, the extracted shapelet S is applied to update the evaluation dataset for the child node (lines 7–8). Only the training instances whose distances are not greater than the split threshold are selected to form the subdataset. Fifth, the model repeats steps 2–8 until the termination condition is satisfied (line 10). Last, the shapelet-based classification route for T is returned (line 11).

3.5. Computing Shapelet Coverage Score

Definition 9 (shapelet coverage). Shapelet coverage refers to the corresponding time interval of a given shapelet S. If a time stamp t falls within the shapelet coverage of S, then we state that t is covered by S.
In our work, the discriminatory score is calculated for each time stamp based on the coverage intervals of all obtained shapelets. First, an indicator function is presented to determine whether a time stamp t is covered by a given shapelet S:where sp denotes the starting position of S in the time series.
Then, based on shapelets captured on the decision path of all correctly predicted time series, the importance of time stamps for different classes can be evaluated through the following formula:where represents the set of correctly predicted instances with class value c and indicates the set of shapelets captured on the classification path of T.
In essence, the coverage score reflects the discriminability of the time stamp t, which can be used to detect the distribution of distinguishing feature intervals and the feature frequency information for each category. Generally, the interval composed of several consecutive time stamps with similar scores corresponds to the local differentiating features. Therefore, in our work, the interval satisfying the condition would be regarded as the local feature location. In addition, the scores demonstrate their occurrence frequency.
Here, the algorithm of computing the shapelet coverage scores for different classes in the dataset will be introduced.
In Algorithm 5, to compute the shapelet coverage score for each class, the dataset of the correctly predicted instances with the specific class is first obtained (lines 2-3). Then, all shapelets captured by LSCR for every instance in are collected (lines 4–6). Finally, the scores reflecting the discriminability of each time stamp for every class are calculated based on equation (12) (line 7).

	Input: the dataset of correctly predicted instances: ; the set of class values: C; the length of time series: m.
	Output: the shapelet coverage scores: Scores
(1)	Initialization of a matrix Scores
(2)	for each to do
(3)	Select the instances with class from to form the dataset
(4)	for each T in do
(5)	Get the shapelet set of T on its classification route learned by Algorithm 4
(6)	end for
(7)	Calculate the shapelet coverage score Scores[][] based on equation (12)
(8)	end for
(9)	return Scores

4. Experiments

Experimental analyses are conducted on 20 datasets from the UCR time series repository [38], most of which are frequently used for evaluation of shapelet-based models [9, 21, 22, 24, 27]. The experimental data are divided into training and test parts. The former part is used to build the model, while the latter part is applied to calculate the classification accuracy. The information of datasets is listed in Table 1, including train (size of the training set), test (size of the test set), max_k (the minimum number of instances of a class in the training set; that is, the maximum value that parameter k can take.), length (the length of time series), and classes (the number of classes).

4.1. Parameter k Analysis

To study the effect of the size of the evaluation dataset on the discriminative evaluation of shapelets, the accuracy trends of within the specified range over 10 datasets are first analyzed as a representative. Then, the average accuracy curves of 5 fusion models (, , , , and ) are presented for parameter setting.

Figure 3 shows the sensitivity of prediction results of different datasets to parameter k. From Figure 3(a), it can be seen that most accuracy rates on 5 binary-class datasets reach the maximum values when parameter k is 5, and then all of the curves show a significant downward trend. From Figure 3(b), except for the MiddlePhalanxOutlineAgeGroup, the accuracy rates on 4 multiclass datasets exhibit a growth trend as k increases in the previous stage. When k is greater than 6, the accuracy on each dataset tends to be stable or decrease. The above experimental results suggest that the accuracy of LSCR generally varies regularly with k. Therefore, we propose to set k based on the trend of average accuracy.

(a)

(b)

(c)

In Figure 4(a), it can be seen that, on binary-class datasets, the average accuracy variations of 5 fusion models show two significantly different trends. One shows a trend of increasing and gradually becoming stable, while the other presents a significant decline in accuracy after passing the inflection point. In Figure 4(b), on multiclass datasets, the average accuracies of all fusion models first increase and then decrease slowly after reaching the peak. Finally, in our work, the value k corresponding to the highest average accuracy is set as the final parameter of on binary-class and multiclass datasets, respectively. See Table 2 for specific settings of the 5 fusion models.

(a)

(b)

In addition, the effect of instance selection on model performance is interpreted in Figure 5. In the scatter diagram, each point stands for a dataset. The more points there are in the figure that fall below the diagonal line, the better the performance of . Since the targeted evaluation dataset is very helpful to improve the feature quality, the proposed shapelet evaluation strategy can significantly improve the performance of our model. As shown in Figure 5, outperforms (whole) on almost all datasets. The accuracy rates of the above two models are listed in Table 3.

4.2. Fusion Strategy Analysis

In this section, the effectiveness of the fusion strategy of global and local similarities is analyzed. Table 3 presents the accuracies of 5 1NN models combined with different distance functions (DTW, ERP, ED, TWE, and MSM) and their corresponding five fusion models. The accuracies of the 1NN models are taken from the website [38].

Figure 6 shows a critical difference diagram studied in literature [39, 40] for the 10 classification models on 20 datasets. This diagram is used for the overall test of significance of average ranks and can group models without significant differences into cliques. From the figure, we find that there are no significant differences in the performance of the 10 models on the 20 datasets, but the rankings of all fusion models are better than those of the corresponding 1NN models. This result suggests that the fusion model can effectively improve the classification performance of the 1NN model to some extent. Since ranks first, it will be used for further analysis in the following.

4.3. Performance Analysis

In this section, the proposed model is compared with shapelet-based classifiers and 1NN classifiers combining with many commonly used distance functions and deep learning models.

4.3.1. Comparison with Shapelet-Based Classifiers

Since the classification process of the model proposed is similar to the decision tree, for the sake of fairness, all shapelet-based comparison models choose the decision tree to make predictions. Our model is first compared with SDT, fast shapelet tree (FS) [26], and C4.5 model [41] based on two respective shapelet transform algorithms. The symbol SSC4.5 denotes the C4.5 model based on the transform algorithm proposed by Yuan et al. [22], and STC4.5 represents the C4.5 model based on the other one proposed by Lines et al. [20]. The shapelet length ranges from 3 to the full length of time series and increases by 1 each time. All experimental results are listed in Table 4, and the accuracies of FS are obtained from the website [38]. In addition, the last line in Table 4 provides the average accuracy of each model over 20 datasets.

As observed from Figure 7, it is clear that although there is no significant difference between and the other 4 shapelet-based models, the average rank of is the best.

Figure 8 presents the scatter plot of accuracy comparison between and the 4 classical shapelet-based classifiers. Figure 8(a) shows that is better than SDT (14 of 20) over the 20 datasets. From Table 4, we find that compared with that of SDT built on the entire training set, the accuracies of on datasets MoteStrain, FacesUCR, SonyAIBORobotSurface2, etc., are significantly improved. In particular, on the dataset FacesUCR, the accuracy of is 20% greater than that of SDT. In Figures 8(b)–8(d), it can be concluded that is also better than STC4.5 (13 of 16), SSC4.5 (13 of 20), and FS (14 of 20). Hence, Figure 8 demonstrates that outperforms the existing shapelet-based decision tree model on the 20 datasets.

(a)

(b)

(c)

(d)

To compare the time complexity of with other shapelet-based models, the changes in running times of 6 shapelet-based models with increasing number of instances and instance length are shown in Figure 9, respectively. (single) represents the training time of a single model for a specific test instance, while represents the model running on the entire test set. In the experiment, a binary-class dataset with uniform distribution is designed to run the analysis models, and the size of the training set is always the same as that of the test dataset. For the first experiment, the sizes of training and test datasets increase by 10 at a time. The length of all instances remains 100. For another, the instance length increases by 10 at a time. The sizes of training and test sets are 100.

(a)

(b)

(c)

(d)

From Figures 9(a) and 9(b), we can determine that the training time of (single) is not sensitive to the size of the training set and is polynomial with respect to the length of the time series. In particular, it is faster than the current fastest shapelet-based model FS. Further, as observed from Figures 9(c) and 9(d), with increasing training set size and instance length, the time consumption gap between SDT, STC4.5, and widens, while the gap between and SSC4.5 is not obvious.

In conclusion, is an accurate and rapid shapelet-based classification model, which is the basis of learning feature distribution and frequency information.

4.3.2. Comparison with Various 1NN Classifiers

In this section, except for the five distance functions listed above, we further compare our model with several DTW variants, including WDTW, complexity-invariant DTW (CID) [42], and () [43]. The accuracies for the above 1NN models are provided by the website [38] (see Table 4).

Figure 10 demonstrates that our model exhibits no significant difference from DTW and its variants, but the average ranks of are the best. In Figure 11, it is clear that is better than ED (16 of 20), TWE (14 of 20), ERP (13 of 20), MSM (12 of 20), DTW (11 of 20), WDTW (14 of 20), CID (11 of 20), and (13 of 20).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

In summary, is competitive with the 1NN model that depends on global similarity. The comparison results suggest that continuously narrowing the search space of class attributes based on local similarity is an effective way to make prediction.

4.3.3. Comparison with Deep Learning Models

Now, various deep learning models have been widely studied in the field of time series classification. Karim et al. [44] attempted to improve the univariate time series classification performance of fully convolutional neural networks (FCNs) by using the long short-term memory recurrent neural network (LSTM-RNNs) submodules and attention mechanism and proposed the excellent models LSTM-FCN and ALSTM-FCN. Furthermore, the authors applied the above two models to the multivariate time series classification problem [45] and studied the reasons why the two models have superior performance [46]. Fawaz et al. [47] have reviewed some deep learning models of time series classification. Here, our model is compared with LSTM-FCN and 9 deep learning models (ResNet, FCN, Encoder, MLP, Time-CNN, TWIESN, MCDCNN, MCNN, and t-LeNet) analyzed in the literature [47]. All the experimental results were obtained from the corresponding literature (see Table 5).

Figure 12 shows that, except for ResNet and FCN, LSTM-FCN is significantly better than other models, and that our model is significantly better than MCNN and t-LeNet. Among the 10 deep learning models, the average rank of our model is better than 7 of them. In Figure 13, it can be seen that is better than Encoder (12 of 20), Time-CNN (12 of 20), MLP (12 of 20), TWIESN (13 of 20), MCDCNN (17 of 20), t-LeNet (19 of 20), and MCNN (20 of 20).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Generally, to improve performance, deep learning model tuning requires an enormous computational cost. To pursue the optimal accuracy rate, Fawaz et al. [48] even proposed the neural network ensemble model with 60 deep learning models, but it is still not better than the traditional ensemble model HIVE-COTE [49]. It is unfair to compare our model with the deep learning model based on accuracy alone. In addition to improving accuracy, we believe that the model interpretability and data comprehensibility require more attention. However, the existing feature extraction methods for time series usually cannot simultaneously obtain the feature distribution and frequency information. For the deep learning model, it is difficult to train a targeted model for the specific instance on a small dataset. Accordingly, in our work, a highly interpretable classification model based on the lazy learning strategy is built for each target instance, which can be applied to gain insight into the local feature distribution and frequency information. The following is a detailed introduction.

4.4. Interpretability

To demonstrate the stronger interpretability of our model, this section separately analyses on a binary-class dataset, MoteStrain, and the CBF multiclass dataset.

4.4.1. MoteStrain Dataset

Sensing data in MoteStrain are originally collected to detect potential variables online in the sensor network [50]. The classification task on this dataset is to distinguish whether the sensor is used for humidity measurement or temperature measurement. The classification performance of our model is significantly better than that of the model used for comparison and is close to the current best classification result provided by Bagnall et al. [38] on this dataset.

To further investigate the proposed model, the shapelet decision tree built by SDT is shown in Figure 14, where the symbol represents the shapelet extracted from the fourth training instance in the root node of the shapelet decision tree. As seen from Figure 14, there is only one shapelet in the decision tree, where d denotes the distance between the test instance and the shapelet and represents the split threshold of the shapelet. Based on the shapelet decision tree, when the distance between the test instance and the shapelet corresponding to the root node is not greater than the split threshold, the class prediction value of the test instance is Class1. Otherwise, it is Class2.

Figure 15 shows six instances and their shapelets extracted through . Our model can correctly predict the class properties of these six instances, while SDT fails. Meanwhile, it is obvious that there are not only significant differences between instances of different types but also intraclass variations among similar instances. For example, the differences among the three instances with the same class label, , , and , are noticeable, while there are no obvious common features. However, in the shapelet decision tree built by SDT, only a shapelet is found, which is not sufficient to distinguish the two classes. In contrast to the illustration of the shapelet given in Figure 14, it is not difficult to find that the optimal shapelet obtained in the shapelet decision tree is not the most discriminatory feature for the test instances. This result verifies that the shapelets extracted from the entire dataset are the most discriminatory for each training instance in the average sense; this is also the reason for the poor performance of SDT on the MoteStrain dataset.

(a)

(b)

(c)

(d)

(e)

(f)

Since the characteristics of each test instance have been considered in our model, in light of this situation, we can achieve better classification results. In addition, based on the shapelet obtained by , the prediction process of each instance can be explained. For example, as shown in Figure 15, the reason why the 11th test instance belongs to Class1 is that the local feature lies in its initial stage, while the local feature of the 41st test instance in the middle part determines its predicted class label.

Figure 16 displays the scores of shapelets coverage on the MoteStrain training and test datasets. It is obvious that the high score intervals on training and test sets are very similar. Since there are only 20 instances in the training set, the scores of shapelet coverage (as shown in Figure 16(a)) may not accurately reflect the local characteristics. However, we propose to directly evaluate the local characteristics of test cases, which can help us utilize large amounts of test data. In Figure 16(b), it can be seen that local features with different coverage frequencies of various classes have been detected from the test dataset, which cannot be captured by other shapelet-based models. For example, the intervals [9, 16] (low frequency), [44, 54] (medium frequency), and [59, 76] (high frequency) (as shown in Figure 16(b)) are three significant discriminative intervals of Class2, while the interval [21, 50] (high frequency) is the most discriminative part of Class1. Based on the proposed model, using more instances with accurate labels results in obtaining more accurate local feature information. The statistical information helps us acquire more comprehensive local characteristic information of time series data, such as feature distribution and frequency.

(a)

(b)

4.4.2. CBF Dataset

This section studies the multiclass dataset CBF, which contains three types: Cylinder, Bell, and Funnel. Figure 17 shows the coverage scores of the CBF test dataset. It can be determined that the interval [31, 75] covers the most discriminative intervals for all three classes. In addition, unlike Cylinder and Funnel, the local interval [0, 15] with relatively low coverage frequency is discriminative for Bell.

Next, four specific test instances are presented. As shown in Figure 18, for the multiclass dataset, our model can not only detect high-frequency shapelets (as shown in Figures 18(a)–18(c)) but also effectively capture low-frequency shapelets (as shown in Figure 18(d)). The proposed model is helpful to make a targeted analysis of each category of data and each test instance.

(a)

(b)

(c)

(d)

5. Conclusions

Aiming at the problems of global shapelet-based models built on the whole training set, a data-driven model fusing global and local similarities is proposed. In the model, the shapelet discriminability is evaluated through a specific subdataset. A smaller evaluation dataset reduces the computational time and improves the quality of shapelets. Moreover, target learning for each instance helps us understand the prediction process clearly. For example, the shapelets extracted by our LSCR model can be directly used to explain what characteristics determine the class value of the test instance. Furthermore, the proposed shapelet coverage score is applied to accurately analyze the local feature information of each class, which provides comprehensive insight into data characteristics. In the future, the application of the model in specific fields will be further studied, including ECG detection and image contour feature discovery.

Data Availability

The time series data used to support the findings of this study have been deposited in the UEA and UCR Time Series Classification Repository (http://www.timeseriesclassification.com).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 61672086, 61702030, and 61771058) and Beijing Natural Science Foundation (no. 4182052).

References

A. McGovern, D. H. Rosendahl, R. A. Brown, and K. K. Droegemeier, “Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction,” Data Mining and Knowledge Discovery, vol. 22, no. 1-2, pp. 232–258, 2011.
View at: Publisher Site | Google Scholar
O. Patri, M. Wojnowicz, and M. Wolff, “Discovering malware with time series shapelets,” in Proceedings of the 50th Hawaii International Conference on System Sciences, pp. 6079–6088, Maui, HI, USA, January 2017.
View at: Google Scholar
L. Zhu, C. Lu, and Y. Sun, “Time series shapelet classification based online short-term voltage stability assessment,” IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1430–1439, 2016.
View at: Publisher Site | Google Scholar
H. Zou, Y. Zhou, J. Yang, W. Gu, L. Xie, and C. J. Spanos, “Wifi-based human identification via convex tensor shapelet learning,” in Proceedings of the 32th AAAI Conference on Artificial Intelligence, pp. 1711–1718, New Orleans, Louisiana, USA, February 2018.
View at: Google Scholar
C. Bock, T. Gumbsch, M. Moor, B. Rieck, D. Roqueiro, and K. Borgwardt, “Association mapping in biomedical time series via statistically significant shapelet mining,” Bioinformatics, vol. 34, no. 13, pp. 438–446, 2018.
View at: Publisher Site | Google Scholar
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Querying and mining of time series data,” Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1542–1552, 2008.
View at: Publisher Site | Google Scholar
A. Abanda, U. Mori, and J. A. Lozano, “A review on distance based time series classification,” Data Mining and Knowledge Discovery, vol. 33, no. 2, pp. 378–412, 2019.
View at: Publisher Site | Google Scholar
J. Lines and A. Bagnall, “Time series classification with ensembles of elastic distance measures,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 565–592, 2015.
View at: Publisher Site | Google Scholar
A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Mining and Knowledge Discovery, vol. 31, no. 3, pp. 606–660, 2017.
View at: Publisher Site | Google Scholar
Y. Ye, J. Jiang, B. Ge, Y. Dou, and K. Yang, “Similarity measures for time series data classification using grid representation and matrix distance,” Knowledge and Information Systems, vol. 60, no. 2, pp. 1105–1134, 2019.
View at: Publisher Site | Google Scholar
J. Yuan, A. Douzal-Chouakria, S. Varasteh Yazdi, and Z. Wang, “A large margin time series nearest neighbour classification under locally weighted time warps,” Knowledge and Information Systems, vol. 59, no. 1, pp. 117–135, 2019.
View at: Publisher Site | Google Scholar
J. Yuan, Q. Lin, W. Zhang, and Z. Wang, “Locally slope-based dynamic time warping for time series classification,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1713–1722, Beijing, China, November 2019.
View at: Google Scholar
C. A. Ratanamahatana and E. Keogh, “Three myths about dynamic time warping data mining,” in Proceedings of the 5th SIAM International Conference on Data Mining, pp. 506–510, Newport Beach, CA, USA, April 2005.
View at: Google Scholar
Y.-S. Jeong, M. K. Jeong, and O. A. Omitaomu, “Weighted dynamic time warping for time series classification,” Pattern Recognition, vol. 44, no. 9, pp. 2231–2240, 2011.
View at: Publisher Site | Google Scholar
L. Chen and R. Ng, “On the marriage of lp-norms and edit distance,” in Proceedings of the 30th international Conference on Very Large Data Bases, pp. 792–803, Toronto, Canada, August 2004.
View at: Google Scholar
P. F. Marteau, “Time warp edit distance with stiffness adjustment for time series matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 306–318, 2008.
View at: Google Scholar
A. Stefan, V. Athitsos, and G. Das, “The move-split-merge metric for time series,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1425–1438, 2013.
View at: Publisher Site | Google Scholar
L. Ye and E. Keogh, “Time series shapelets: a new primitive for data mining,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956, Paris, France, June 2009.
View at: Google Scholar
B. K. Iwana and S. Uchida, “Time series classification using local distance-based features in multi-modal fusion networks,” Pattern Recognition, vol. 97, Article ID 107024, 2020.
View at: Google Scholar
J. Lines, L. M. Davis, J. Hills, and A. Bagnall, “A shapelet transform for time series classification,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 289–297, Beijing, China, August 2012.
View at: Google Scholar
J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall, “Classification of time series by shapelet transformation,” Data Mining and Knowledge Discovery, vol. 28, no. 4, pp. 851–881, 2014.
View at: Publisher Site | Google Scholar
J. D. Yuan, Z. H. Wang, and M. Han, “A discriminative shapelets transformation for time series classification,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 28, no. 6, pp. 1–28, 2014.
View at: Publisher Site | Google Scholar
F. J. Baldán and J. M. Benítez, “Distributed FastShapelet transform: a big data time series classification algorithm,” Information Sciences, vol. 496, pp. 451–463, 2019.
View at: Publisher Site | Google Scholar
W. Zalewski, F. Silva, A. G. Maletzke, and C. A. Ferrero, “Exploring shapelet transformation for time series classification in decision trees,” Knowledge-Based Systems, vol. 112, pp. 80–91, 2016.
View at: Publisher Site | Google Scholar
A. Mueen, E. Keogh, and N. Young, “Logical-shapelets: an expressive primitive for time series classification,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1154–1162, San Diego, CA, USA, August 2011.
View at: Google Scholar
T. Rakthanmanon and E. Keogh, “Fast shapelets: a scalable algorithm for discovering time series shapelets,” in Proceedings of the 13th SIAM International Conference on Data Mining, pp. 668–676, Austin, TX, USA, May 2013.
View at: Google Scholar
J. Grabocka, N. Schilling, M. Wistuba, and L. Schmidt-Thieme, “Learning time-series shapelets,” in Proceedings of the 20th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp. 392–401, New York, NY, USA, August 2014.
View at: Google Scholar
Z. Fang, P. Wang, and W. Wang, “Efficient learning interpretable shapelets for accurate time series classification,” in Proceedings of the 34th International Conference on Data Engineering, pp. 497–508, Paris, France, April 2018.
View at: Google Scholar
C. Ji, C. Zhao, S. Liu et al., “A fast shapelet selection algorithm for time series classification,” Computer Networks, vol. 148, pp. 231–240, 2019.
View at: Publisher Site | Google Scholar
X. Renard, M. Rifqi, W. Erray, and M. Detyniecki, “Random-shapelet: an algorithm for fast shapelet discovery,” in Proceedings of 2015 IEEE International Conference on Data Science and Advanced Analytics, pp. 1–10, Paris, France, October 2015.
View at: Google Scholar
I. Karlsson, P. Papapetrou, and H. Boström, “Generalized random shapelet forests,” Data Mining and Knowledge Discovery, vol. 30, no. 5, pp. 1053–1085, 2016.
View at: Publisher Site | Google Scholar
M. H. Shi, Z. H. Wang, J. D. Yuan, and H. Y. Liu, “Random pairwise shapelets forest,” in Proceedings of the 22th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 68–80, Melbourne, VIC, Australia, June 2018.
View at: Google Scholar
L. Hou, J. T. Kwok, and J. M. Zurada, “Efficient learning of timeseries shapelets,” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 1209–1215, Phoenix, AR, USA, February 2016.
View at: Google Scholar
H. Wang, Q. Zhang, J. Wu, S. Pan, and Y. Chen, “Time series feature learning with labeled and unlabeled data,” Pattern Recognition, vol. 89, pp. 55–66, 2019.
View at: Publisher Site | Google Scholar
Q. Ma, W. Zhuang, and G. Cottrell, “Triple-shapelet networks for time series classification,” in Proceedings of 2019 IEEE International Conference on Data Mining (ICDM 2019), pp. 1246–1251, Beijing, China, November 2019.
View at: Google Scholar
H. Zhao, Z. Pan, and W. Tao, “Regularized shapelet learning for scalable time series classification,” Computer Networks, vol. 173, Article ID 107171, 2019.
View at: Google Scholar
W. Zhang, Z. H. Wang, J. D. Yuan, and S. L. Hao, “An effective lazy shapelet discovery algorithm for time series classification,” in Proceedings of the 25th International Conference on Neural Information Processing (ICONIP 2018), pp. 46–55, Siem Reap, Cambodia, December 2018.
View at: Google Scholar
A. Bagnall, J. Lines, W. Vickers, and E. Keogh, The UEA & UCR Time Series Classification Repository, 2019, , 2019, http://www.timeseriesclassification.com.
J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, no. 1, pp. 1–30, 2006.
View at: Google Scholar
A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 152–161, 2016.
View at: Google Scholar
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1993.
G. E. A. P. A. Batista, E. J. Keogh, O. M. Tataw, and V. M. A. de Souza, “CID: An efficient complexity-invariant distance for time series,” Data Mining and Knowledge Discovery, vol. 28, no. 3, pp. 634–669, 2014.
View at: Publisher Site | Google Scholar
T. Górecki and M. Łuczak, “Using derivatives in time series classification,” Data Mining and Knowledge Discovery, vol. 26, no. 2, pp. 310–331, 2013.
View at: Publisher Site | Google Scholar
F. Karim, S. Majumdar, H. Darabi, and S. Chen, “LSTM fully convolutional networks for time series classification,” IEEE Access, vol. 6, pp. 1662–1669, 2018.
View at: Publisher Site | Google Scholar
F. Karim, S. Majumdar, H. Darabi, and S. Harford, “Multivariate LSTM-FCNS for time series classification,” Neural Networks, vol. 116, pp. 237–245, 2019.
View at: Publisher Site | Google Scholar
F. Karim, S. Majumdar, and H. Darabi, “Insights into LSTM fully convolutional networks for time series classification,” IEEE Access, vol. 7, pp. 67718–67725, 2019.
View at: Publisher Site | Google Scholar
H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller, “Deep learning for time series classification: a review,” Data Mining and Knowledge Discovery, vol. 33, pp. 917–963, 2019.
View at: Google Scholar
H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller, “Deep neural network ensembles for time series classification,” in Proceedings of the 2019 International Joint Conference on Neural Networks, pp. 14–19, Budapest, Hungary, July 2019.
View at: Google Scholar
J. Lines, S. Taylor, and A. Bagnall, “Time series classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 5, p. 52, 2018.
View at: Publisher Site | Google Scholar
H. A. Dau, E. Keogh, K. Kamgar et al., The UCR Time Series Classification Archive, 2018, , 2018, https://www.cs.ucr.edu/ ~ eamonn/time_ series_ data_ 2018/.

Copyright

Copyright © 2020 Wei Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

933

Downloads

1069

Citations