Unsupervised and scalable subsequence anomaly detection in large data series,The VLDB Journal

当前位置： X-MOL 学术 › VLDB J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised and scalable subsequence anomaly detection in large data series
The VLDB Journal ( IF 2.8 ) Pub Date : 2021-03-03 , DOI: 10.1007/s00778-021-00655-8
Paul Boniol , Michele Linardi , Federico Roncallo , Themis Palpanas , Mohammed Meftah , Emmanuel Remy

Subsequence anomaly (or outlier) detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have severe limitations: they either require prior domain knowledge or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems and propose NormA, a novel approach, suitable for domain-agnostic anomaly detection. NormA is based on a new data series primitive, which permits to detect anomalies based on their (dis)similarity to a model that represents normal behavior. The experimental results on several real datasets demonstrate that the proposed approach correctly identifies all single and recurrent anomalies of various types, with no prior knowledge of the characteristics of these anomalies (except for their length). Moreover, it outperforms by a large margin the current state-of-the art algorithms in terms of accuracy, while being orders of magnitude faster.

中文翻译：

大数据系列中无监督且可扩展的子序列异常检测

长序列中的子序列异常（或异常值）检测是在广泛领域中应用的重要问题。然而，迄今为止在文献中提出的方法具有严重的局限性：它们要么需要先验领域知识，要么在具有相同类型的反复异常的情况下变得笨重且昂贵。在这项工作中，我们解决了这些问题，并提出了适用于与域无关的异常检测的新颖方法NormA。NormA基于新的数据系列原语，该原语允许基于与表示正常行为的模型的（不）相似性来检测异常。在几个真实数据集上的实验结果表明，该方法可以正确识别各种类型的所有单个和重复异常，没有这些异常的特征（长度除外）的先验知识。此外，就准确性而言，它在很大程度上要优于当前的最新算法，而速度要快几个数量级。

更新日期：2021-03-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文