Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2021-12-22 , DOI: 10.1080/10618600.2021.2000425 Sevvandi Kandanaarachchi 1 , Rob J Hyndman 2
Abstract
This article introduces lookout, a new approach to detect outliers using leave-one-out kernel density estimates and extreme value theory. Outlier detection methods that use kernel density estimates generally employ a user defined parameter to determine the bandwidth. Lookout uses persistent homology to construct a bandwidth suitable for outlier detection without any user input. We demonstrate the effectiveness of lookout on an extensive data repository by comparing its performance with other outlier detection methods based on extreme value theory. Furthermore, we introduce outlier persistence, a useful concept that explores the birth and the cessation of outliers with changing bandwidth and significance levels. The R package lookout implements this algorithm. Supplementary files for this article are available online.
中文翻译:
用于异常值检测的留一法核密度估计
摘要
本文介绍了lookout,一种使用留一法核密度估计和极值理论检测异常值的新方法。使用核密度估计的异常值检测方法通常使用用户定义的参数来确定带宽。Lookout 使用持久同源性来构建适合于异常值检测的带宽,而无需任何用户输入。我们通过将其性能与其他基于极值理论的异常值检测方法进行比较,证明了 Lookout 在广泛的数据存储库上的有效性。此外,我们引入了异常值持久性,一个有用的概念,它探讨了带宽和显着性水平变化的异常值的产生和停止。R 包lookout 实现了这个算法。本文的补充文件可在线获取。