当前位置: X-MOL 学术Stat. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A further study comparing forward search multivariate outlier methods including ATLA with an application to clustering
Statistical Papers ( IF 1.2 ) Pub Date : 2022-06-01 , DOI: 10.1007/s00362-022-01319-7
Brenton R. Clarke , Andrew Grose

This paper makes comparisons of automated procedures for robust multivariate outlier detection through discussion and simulation. In particular, automated procedures that use the forward search along with Mahalanobis distances to identify and classify multivariate outliers subject to predefined criteria are examined. Procedures utilizing a parametric model criterion based on a \(\chi ^2\)-distribution are among these, whereas the multivariate Adaptive Trimmed Likelihood Algorithm (ATLA) identifies outliers based on an objective function that is derived from the asymptotics of the location estimator assuming a multivariate normal distribution. Several criterion including size (false positive rate), sensitivity, and relative efficiency are canvassed. To illustrate relative efficiency in a multivariate setting in a new way, measures of variability of the multivariate location parameter when the underlying distribution is chosen from a multivariate generalization of the Tukey–Huber \(\epsilon \)-contamination model are used. Mean slippage models are also entertained. The simulation results here are illuminating and demonstrate there is no broadly accepted procedure that outperforms in all situations, albeit one may ascertain circumstances for which a particular method may be best if implemented. Finally the paper explores graphical monitoring for existence of clusters and the potential of classification through occurrence of multiple minima in the objective function using ATLA.



中文翻译:

进一步研究比较前向搜索多元异常值方法(包括 ATLA)与聚类应用

本文通过讨论和模拟比较了稳健的多元异常值检测的自动化程序。特别是,检查了使用前向搜索和马氏距离来识别和分类受预定义标准约束的多变量异常值的自动化程序。利用基于\(\chi ^2\)的参数模型标准的过程分布是其中之一,而多元自适应修剪似然算法 (ATLA) 基于一个目标函数来识别异常值,该目标函数源自假设多元正态分布的位置估计器的渐近线。研究了包括大小(误报率)、敏感性和相对效率在内的几个标准。为了以一种新的方式说明多变量设置中的相对效率,当从 Tukey-Huber \(\epsilon \)的多元泛化中选择基础分布时,测量多元位置参数的可变性- 使用污染模型。平均滑点模型也很受欢迎。这里的模拟结果很有启发性,并表明没有被广泛接受的程序在所有情况下都表现出色,尽管可以确定如果实施特定方法可能最好的情况。最后,本文通过使用 ATLA 在目标函数中出现多个最小值,探索了对集群存在的图形监控和分类潜力。

更新日期:2022-06-01
down
wechat
bug