当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An ultra-fast time series distance measure to allow data mining in more complex real-world deployments
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2020-05-30 , DOI: 10.1007/s10618-020-00695-8
Shaghayegh Gharghabi , Shima Imani , Anthony Bagnall , Amirali Darvishzadeh , Eamonn Keogh

At their core, many time series data mining algorithms reduce to reasoning about the shapes of time series subsequences. This requires an effective distance measure, and for last two decades most algorithms use Euclidean distance or DTW as their core subroutine. We argue that these distance measures are not as robust as the community seems to believe. The undue faith in these measures perhaps derives from an overreliance on the benchmark datasets and self-selection bias. The community is simply reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. For example, it can handle data with missing values or spurious regions. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently as to allow analytics on very fast arriving streams.

中文翻译:

一种超快速的时间序列距离度量,允许在更复杂的实际部署中进行数据挖掘

从本质上讲,许多时间序列数据挖掘算法都可以简化为时间序列子序列形状的推理。这需要有效的距离度量,并且在最近的二十年中,大多数算法都使用欧几里得距离或DTW作为其核心子例程。我们认为,这些距离测量并不像社区所认为的那样强大。对这些措施的过度信任可能源自对基准数据集的过度依赖和自我选择偏见。社区根本不愿解决更困难的领域,而当前的距离测量方法不适用于这些领域。在这项工作中,我们介绍了一种新颖的距离测量MPdist。我们表明,我们提出的距离测度比当前的距离测度更健壮。例如,它可以处理缺少值或虚假区域的数据。此外,它使我们能够成功地挖掘将击败任何基于欧氏或DTW距离的算法的数据集。此外,我们证明了距离度量的计算效率很高,可以对非常快到达的流进行分析。
更新日期:2020-05-30
down
wechat
bug