当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tracking clusters and anomalies in evolving data streams
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2021-10-08 , DOI: 10.1002/sam.11552
Sreelekha Guggilam 1 , Varun Chandola 1, 2 , Abani Patra 1, 3
Affiliation  

Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.

中文翻译:

跟踪不断变化的数据流中的集群和异常

数据驱动的异常检测方法通常为目标系统的正常行为建立模型,并针对该模型对每个数据实例进行评分。总是需要一个阈值来将具有高(或低)分数的数据实例识别为异常。这对此类方法的适用性提出了实际限制,因为大多数方法对阈值的选择很敏感,并且设置最佳阈值具有挑战性。在最佳阈值随时间变化的流式场景中,该问题更加严重。我们提出了一个概率框架来明确地对正常和异常行为进行建模,并对数据进行概率推理。提出了一种基于极值理论的公式,将异常行为建模为正常行为的极端。作为一个具体的实例,提出了一种联合非参数聚类和异常检测算法 (INCAD),将正常行为建模为 Dirichlet 过程混合模型。包括流数据在内的各种数据集的结果表明,所提出的方法无需强大的初始化和阈值参数即可提供有效且同时的聚类和异常检测。
更新日期:2021-10-08
down
wechat
bug