当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Time series motifs discovery under DTW allows more robust discovery of conserved structure
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-02-16 , DOI: 10.1007/s10618-021-00740-0
Sara Alaee , Ryan Mercer , Kaveh Kamgar , Eamonn Keogh

In recent years, time series motif discovery has emerged as perhaps the most important primitive for many analytical tasks, including clustering, classification, rule discovery, segmentation, and summarization. In parallel, it has long been known that Dynamic Time Warping (DTW) is superior to other similarity measures such as Euclidean Distance under most settings. However, due to the computational complexity of both DTW and motif discovery, virtually no research efforts have been directed at combining these two ideas. The current best mechanisms to address their lethargy appear to be mutually incompatible. In this work, we present the first efficient, scalable and exact method to find time series motifs under DTW. Our method automatically performs the best trade-off of time-to-compute versus tightness-of-lower-bounds for a novel hierarchy of lower bounds that we introduce. As we shall show through extensive experiments, our algorithm prunes up to 99.99% of the DTW computations under realistic settings and is up to three to four orders of magnitude faster than the brute force search, and two orders of magnitude faster than the only other competitor algorithm. This allows us to discover DTW motifs in massive datasets for the first time. As we will show, in many domains, DTW-based motifs represent semantically meaningful conserved behavior that would escape our attention using all existing Euclidean distance-based methods.



中文翻译:

在DTW下发现时间序列主题可以更稳健地发现保守结构

近年来,时间序列主题发现已成为许多分析任务(包括聚类,分类,规则发现,分段和摘要)中最重要的原语。同时,人们早就知道动态时间规整(DTW)在大多数情况下都优于其他相似性度量,例如欧几里得距离。但是,由于DTW和主题发现两者的计算复杂性,实际上还没有针对这两种思想的研究。当前解决嗜睡问题的最佳机制似乎是相互矛盾的。在这项工作中,我们提出了第一种有效,可扩展且精确的方法来查找DTW下的时间序列主题。对于我们介绍的下界的新颖层次结构,我们的方法自动在计算时间与下界紧密度之间进行最佳权衡。正如我们将通过广泛的实验表明的那样,我们的算法在实际设置下会修剪多达99.99%的DTW计算,比蛮力搜索快三到四个数量级,比唯一的竞争对手快两个数量级。算法。这使我们能够首次在海量数据集中发现DTW图案。正如我们将要展示的,在许多领域中,基于DTW的主题代表了语义上有意义的保守行为,使用所有现有的基于欧几里德距离的方法,这些行为将使我们脱颖而出。在实际设置下DTW计算的99%,比暴力搜索快三到四个数量级,比唯一的其他竞争者算法快两个数量级。这使我们能够首次在海量数据集中发现DTW图案。正如我们将要展示的,在许多领域中,基于DTW的主题代表了语义上有意义的保守行为,使用所有现有的基于欧几里德距离的方法,这些行为将使我们脱颖而出。在实际设置下99%的DTW计算速度比暴力搜索快三到四个数量级,比唯一的其他竞争者算法快两个数量级。这使我们能够首次在海量数据集中发现DTW图案。正如我们将要展示的,在许多领域中,基于DTW的主题代表了语义上有意义的保守行为,使用所有现有的基于欧几里德距离的方法,这些行为将使我们免于关注。

更新日期:2021-02-16
down
wechat
bug