当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust archetypoids for anomaly detection in big functional data
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2020-08-03 , DOI: 10.1007/s11634-020-00412-9
Guillermo Vinue , Irene Epifanio

Archetypoid analysis (ADA) has proven to be a successful unsupervised statistical technique to identify extreme observations in the periphery of the data cloud, both in classical multivariate data and functional data. However, two questions remain open in this field: the use of ADA for outlier detection and its scalability. We propose to use robust functional archetypoids and adjusted boxplot to pinpoint functional outliers. Furthermore, we present a new archetypoid algorithm for obtaining results from large data sets in reasonable time. Functional time series are occurring in many practical problems, so this paper focuses on functional data settings. The new algorithm for detecting functional anomalies, called CRO-FADALARA, can be used with both univariate and multivariate curves. Our proposal for outlier detection is compared with all the state-of-the-art methods in a controlled study, showing a good performance. Furthermore, CRO-FADALARA is applied to two large time series data sets, where outliers curves are discussed and the reduction in computational time is clearly stated. A third case study with a small ECG data set is discussed, given its importance in functional data scenarios. All data, R code and a new R package are freely available.



中文翻译:

大功能数据中异常检测的鲁棒原型

原型假肢分析(ADA)已被证明是一种成功的无监督统计技术,可用于识别数据云外围的极端观测值,包括经典多元数据和功能数据。但是,该领域仍然存在两个问题:将ADA用于离群值检测及其可伸缩性。我们建议使用健壮的功能原型和调整后的箱线图来确定功能异常值。此外,我们提出了一种新的原型算法,可以在合理的时间内从大型数据集中获取结果。功能时间序列是在许多实际问题中出现的,因此本文重点介绍功能数据设置。用于检测功能异常的新算法称为CRO-FADALARA,可与单变量和多变量曲线一起使用。我们对异常值检测的建议与对照研究中的所有最新方法进行了比较,显示出良好的性能。此外,将CRO-FADALARA应用于两个大型时间序列数据集,其中讨论了异常值曲线,并明确说明了计算时间的减少。鉴于其在功能数据场景中的重要性,我们讨论了具有少量ECG数据集的第三个案例研究。所有数据,R代码和新的R包均可免费获得。

更新日期:2020-08-04
down
wechat
bug