Categorical anomaly detection in heterogeneous data using minimum description length clustering,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Categorical anomaly detection in heterogeneous data using minimum description length clustering
arXiv - CS - Databases Pub Date : 2020-06-14 , DOI: arxiv-2006.07916
James Cheney, Xavier Gombau, Ghita Berrada and Sidahmed Benabderrahmane

Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets representing a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.

中文翻译：

基于最小描述长度聚类的异构数据分类异常检测

基于最小描述长度（MDL）原则，针对分类数据提出了快速有效的无监督异常检测算法。然而，当检测代表不同来源混合的异构数据集中的异常时，它们可能无效，例如系统和用户进程具有不同行为模式的安全场景。我们提出了一种元算法，用于增强任何基于 MDL 的异常检测模型，以通过 k 均值聚类的变体将混合模型拟合到数据来处理异构数据。我们的实验结果表明，相对于之前的两种异常检测算法，使用离散混合模型提供了具有竞争力的性能，而更复杂模型的混合产生了进一步的收益，

更新日期：2020-06-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>