当前位置:
X-MOL 学术
›
arXiv.cs.IR
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
COPOD: Copula-Based Outlier Detection
arXiv - CS - Information Retrieval Pub Date : 2020-09-20 , DOI: arxiv-2009.09463 Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, Xiyang Hu
arXiv - CS - Information Retrieval Pub Date : 2020-09-20 , DOI: arxiv-2009.09463 Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, Xiyang Hu
Outlier detection refers to the identification of rare items that are deviant
from the general data distribution. Existing approaches suffer from high
computational complexity, low predictive capability, and limited
interpretability. As a remedy, we present a novel outlier detection algorithm
called COPOD, which is inspired by copulas for modeling multivariate data
distribution. COPOD first constructs an empirical copula, and then uses it to
predict tail probabilities of each given data point to determine its level of
"extremeness". Intuitively, we think of this as calculating an anomalous
p-value. This makes COPOD both parameter-free, highly interpretable, and
computationally efficient. In this work, we make three key contributions, 1)
propose a novel, parameter-free outlier detection algorithm with both great
performance and interpretability, 2) perform extensive experiments on 30
benchmark datasets to show that COPOD outperforms in most cases and is also one
of the fastest algorithms, and 3) release an easy-to-use Python implementation
for reproducibility.
中文翻译:
COPOD:基于 Copula 的异常值检测
异常值检测是指识别偏离一般数据分布的稀有项目。现有方法存在计算复杂度高、预测能力低和可解释性有限的问题。作为补救措施,我们提出了一种称为 COPOD 的新型异常值检测算法,该算法受 copula 的启发,用于对多元数据分布进行建模。COPOD 首先构造一个经验 copula,然后用它来预测每个给定数据点的尾部概率,以确定其“极端”程度。直觉上,我们认为这是计算一个异常的 p 值。这使得 COPOD 既无参数,又具有高度可解释性和计算效率。在这项工作中,我们做出了三个关键贡献,1)提出了一部小说,
更新日期:2020-09-22
中文翻译:
COPOD:基于 Copula 的异常值检测
异常值检测是指识别偏离一般数据分布的稀有项目。现有方法存在计算复杂度高、预测能力低和可解释性有限的问题。作为补救措施,我们提出了一种称为 COPOD 的新型异常值检测算法,该算法受 copula 的启发,用于对多元数据分布进行建模。COPOD 首先构造一个经验 copula,然后用它来预测每个给定数据点的尾部概率,以确定其“极端”程度。直觉上,我们认为这是计算一个异常的 p 值。这使得 COPOD 既无参数,又具有高度可解释性和计算效率。在这项工作中,我们做出了三个关键贡献,1)提出了一部小说,