当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting and classifying outliers in big functional data
Advances in Data Analysis and Classification ( IF 1.4 ) Pub Date : 2021-08-30 , DOI: 10.1007/s11634-021-00460-9
Oluwasegun Taiwo Ojo 1, 2 , Antonio Fernández Anta 1 , Rosa E. Lillo 3, 4 , Carlo Sguera 3
Affiliation  

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. ‘Semifast-MUOD’, the first method, uses a sample of the observations in computing the indices, while ‘Fast-MUOD’, the second method, uses the point-wise or \(L_1\) median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.



中文翻译:

检测和分类大功能数据中的异常值

我们提出了两种新的异常值检测方法,用于识别和分类(大)功能数据集中不同类型的异常值。所提出的方法基于称为大规模无监督异常值检测 (MUOD) 的现有方法。MUOD 通过为每条曲线计算三个指标来检测和分类异常值,所有指标均基于线性回归和相关性的概念,这些指标测量相对于数据中其他曲线的形状、幅度和幅度方面的异常值。“Semifast-MUOD”,第一种方法,在计算指数时使用观察样本,而“Fast-MUOD”,第二种方法,使用逐点或\(L_1\)计算指数的中位数。经典箱线图用于将异常值的指数与典型观察值的指数分开。使用模拟数据对所提出方法的性能评估表明,与 MUOD 相比,在异常值检测和计算时间方面都有显着的改进。我们表明,与其他方法相比,Fast-MUOD 特别适合处理大而密集的功能数据集,计算时间非常短。与一些最近的功能数据异常值检测方法的进一步比较也显示了所提出方法的优越或可比的异常值检测精度。我们将建议的方法应用于天气、人口增长和视频数据。

更新日期:2021-08-31
down
wechat
bug