当前位置: X-MOL 学术Comput. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A multivariate extreme value theory approach to anomaly clustering and visualization
Computational Statistics ( IF 1.0 ) Pub Date : 2019-07-20 , DOI: 10.1007/s00180-019-00913-y
Maël Chiapino , Stephan Clémençon , Vincent Feuillard , Anne Sabourin

In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector \(\mathbf{X }=(X_1,\; \ldots ,\; X_d)\) valued in \(\mathbb {R}^d\), correspond to the simultaneous occurrence of extreme values for certain subgroups \(\alpha \subset \{1,\; \ldots ,\; d \}\) of variables \(X_j\). Under the heavy-tail assumption, which is precisely appropriate for modeling these phenomena, statistical methods relying on multivariate extreme value theory have been developed in the past few years for identifying such events/subgroups. This paper exploits this approach much further by means of a novel mixture model that permits to describe the distribution of extremal observations and where the anomaly type \(\alpha \) is viewed as a latent variable. One may then take advantage of the model by assigning to any extreme point a posterior probability for each anomaly type \(\alpha \), defining implicitly a similarity measure between anomalies. It is explained at length how the latter permits to cluster extreme observations and obtain an informative planar representation of anomalies using standard graph-mining tools. The relevance and usefulness of the clustering and 2-d visual display thus designed is illustrated on simulated datasets and on real observations as well, in the aeronautics application domain.

中文翻译:

异常聚类和可视化的多元极值理论方法

在各种各样的在一个复杂的系统,它的健康是通过随机向量的观察监视的行为的情况下,异常\(\ mathbf {X} =(X_1,\; X_d)\; \ ldots,\)值在\(\ mathbb {R} ^ d \)中,对应于变量\ {的某些子组\(\ alpha \ subset \ {1,\; \ ldots,\; d \} \)的极端值的同时出现X_j \)。在恰好适合于对这些现象进行建模的重尾假设下,过去几年中已经开发了基于多元极值理论的统计方法来识别此类事件/子组。本文通过一种新颖的混合模型进一步利用了这种方法,该模型允许描述极端观测值的分布,并且其中异常类型\(\ alpha \)被视为潜在变量。然后,可以通过为每种异常类型\(\ alpha \)分配任意极点后验概率来利用该模型,隐式定义异常之间的相似性度量。详细解释了后者如何允许使用标准图形挖掘工具对极端观测进行聚类并获得异常的信息性平面表示。这样设计的聚类和二维视觉显示的相关性和实用性在航空应用领域的模拟数据集和实际观测结果中也得到了说明。
更新日期:2019-07-20
down
wechat
bug