catch22 : CAnonical Time-series CHaracteristics,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

catch22 : CAnonical Time-series CHaracteristics
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2019-08-09 , DOI: 10.1007/s10618-019-00647-x
Carl H. Lubba , Sarab S. Sethi , Philip Knaute , Simon R. Schultz , Ben D. Fulcher , Nick S. Jones

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

中文翻译：

catch22：时间序列特征

简洁地捕获时间序列的动态属性，作为可解释的特征向量，可以为跨科学和行业的时间序列应用程序实现有效的聚类和分类。可以通过在全面的时间序列特征库（例如hctsa中的特征库）之间进行系统比较，为给定应用程序选择适当的基于特征的时间序列表示形式工具箱。然而，这种方法在计算上是昂贵的，并且涉及评估许多相似的特征，从而限制了在现实世界中应用基于特征的时间序列表示的广泛采用。在这项工作中，我们介绍了一种推断时间序列特征的小集合的方法，该方法（i）在给定的时间序列问题集合中表现出强大的分类性能，并且（ii）具有最小的冗余性。将我们的方法应用于93个时间序列分类数据集（包含超过147,000个时间序列），并使用hctsa特征库的过滤版本（4791个特征），我们引入了22个CAnonical时间序列CHaracteristics，catch22，以适应时序数据挖掘任务中通常遇到的动态变化。尽管将分类精度平均降低了7％，但从4791降到22却使计算时间减少了约1000倍，并且随着时间序列的长度几乎线性缩放。catch22根据其属性（包括线性和非线性自相关，连续差异，值分布和离群值以及波动标度属性），捕获时间序列的多种且可解释的特征。我们提供catch22的高效实现，可从许多编程环境中访问，使用通用的可解释时间序列属性语言，促进科学，工业，金融和医疗应用的基于特征的时间序列分析。

更新日期：2019-08-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11