COPOD: Copula-Based Outlier Detection,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

COPOD: Copula-Based Outlier Detection
arXiv - CS - Information Retrieval Pub Date : 2020-09-20 , DOI: arxiv-2009.09463
Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, Xiyang Hu

Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.

中文翻译：

COPOD：基于 Copula 的异常值检测

异常值检测是指识别偏离一般数据分布的稀有项目。现有方法存在计算复杂度高、预测能力低和可解释性有限的问题。作为补救措施，我们提出了一种称为 COPOD 的新型异常值检测算法，该算法受 copula 的启发，用于对多元数据分布进行建模。COPOD 首先构造一个经验 copula，然后用它来预测每个给定数据点的尾部概率，以确定其“极端”程度。直觉上，我们认为这是计算一个异常的 p 值。这使得 COPOD 既无参数，又具有高度可解释性和计算效率。在这项工作中，我们做出了三个关键贡献，1）提出了一部小说，

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文