当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
catch22 : CAnonical Time-series CHaracteristics
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2019-08-09 , DOI: 10.1007/s10618-019-00647-x
Carl H. Lubba , Sarab S. Sethi , Philip Knaute , Simon R. Schultz , Ben D. Fulcher , Nick S. Jones

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

中文翻译:

catch22:时间序列特征

简洁地捕获时间序列的动态属性,作为可解释的特征向量,可以为跨科学和行业的时间序列应用程序实现有效的聚类和分类。可以通过在全面的时间序列特征库(例如hctsa中的特征库)之间进行系统比较,为给定应用程序选择适当的基于特征的时间序列表示形式工具箱。然而,这种方法在计算上是昂贵的,并且涉及评估许多相似的特征,从而限制了在现实世界中应用基于特征的时间序列表示的广泛采用。在这项工作中,我们介绍了一种推断时间序列特征的小集合的方法,该方法(i)在给定的时间序列问题集合中表现出强大的分类性能,并且(ii)具有最小的冗余性。将我们的方法应用于93个时间序列分类数据集(包含超过147,000个时间序列),并使用hctsa特征库的过滤版本(4791个特征),我们引入了22个CAnonical时间序列CHaracteristics,catch22,以适应时序数据挖掘任务中通常遇到的动态变化。尽管将分类精度平均降低了7%,但从4791降到22却使计算时间减少了约1000倍,并且随着时间序列的长度几乎线性缩放。catch22根据其属性(包括线性和非线性自相关,连续差异,值分布和离群值以及波动标度属性),捕获时间序列的多种且可解释的特征。我们提供catch22的高效实现,可从许多编程环境中访问,使用通用的可解释时间序列属性语言,促进科学,工业,金融和医疗应用的基于特征的时间序列分析。
更新日期:2019-08-09
down
wechat
bug