Layer-constrained variational autoencoding kernel density estimation model for anomaly detection,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Layer-constrained variational autoencoding kernel density estimation model for anomaly detection
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-03-10 , DOI: 10.1016/j.knosys.2020.105753
Peng Lv , Yanwei Yu , Yangyang Fan , Xianfeng Tang , Xiangrong Tong

Unsupervised techniques typically rely on the probability density distribution of the data to detect anomalies, where objects with low probability density are considered to be abnormal. However, modeling the density distribution of high dimensional data is known to be hard, making the problem of detecting anomalies from high-dimensional data challenging. The state-of-the-art methods solve this problem by first applying dimension reduction techniques to the data and then detecting anomalies in the low dimensional space. Unfortunately, the low dimensional space does not necessarily preserve the density distribution of the original high dimensional data. This jeopardizes the effectiveness of anomaly detection. In this work, we propose a novel high dimensional anomaly detection method called LAKE. The key idea of LAKE is to unify the representation learning capacity of layer-constrained variational autoencoder with the density estimation power of kernel density estimation (KDE). Then a probability density distribution of the high dimensional data can be learned, which is able to effectively separate the anomalies out. LAKE successfully consolidates the merits of the two worlds, namely layer-constrained variational autoencoder and KDE by using a probability density-aware strategy in the training process of the autoencoder. Extensive experiments on six public benchmark datasets demonstrate that our method significantly outperforms the state-of-the-art methods in detecting anomalies and achieves up to 37% improvement in $F_{1}$ score.

中文翻译：

异常检测的层约束变分自编码核密度估计模型

无监督技术通常依靠数据的概率密度分布来检测异常，在该异常中，具有低概率密度的对象被认为是异常的。然而，已知对高维数据的密度分布建模是困难的，这使得从高维数据检测异常的问题具有挑战性。最先进的方法通过首先对数据应用降维技术，然后在低维空间中检测异常来解决此问题。不幸的是，低维空间不一定保留原始高维数据的密度分布。这危害了异常检测的有效性。在这项工作中，我们提出了一种称为LAKE的新颖的高维异常检测方法。LAKE的关键思想是将层约束变分自编码器的表示学习能力与内核密度估计（KDE）的密度估计能力统一起来。然后可以获知高维数据的概率密度分布，这可以有效地将异常分离出来。LAKE通过在自动编码器的训练过程中使用概率密度感知策略成功地巩固了这两个世界的优点，即层约束变分自动编码器和KDE。在六个公共基准数据集上进行的大量实验表明，在检测异常方面，我们的方法明显优于最新方法，并且在检测异常方面最多可提高37％ $F_{1个}$ 得分了。

更新日期：2020-03-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11