当前位置: X-MOL 学术Cluster Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns
Cluster Computing ( IF 3.6 ) Pub Date : 2021-01-29 , DOI: 10.1007/s10586-021-03236-0
Minho Ryu , Geonseok Lee , Kichun Lee

In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.



中文翻译:

在线顺序极端学生化的偏差测试,用于检测具有不同模式的流数据中的异常

在大数据的新时代,许多信息和技术系统可以实时存储大量流数据,例如,存储在Web应用程序服务器上的服务器访问日志中。在来自此类系统的大量流数据中进行异常检测的重要性正在迅速提高。检测任务中的最大挑战之一是对流数据进行实时上下文异常检测,该流数据具有视觉上可检测但不适合参数模型的变化模式。大多数异常检测算法在处理包含此类模式的流时间序列数据方面存在缺陷。在本文中,我们提出了一种使用广义极端学生偏差(GESD)测试在流时间序列数据中进行在线上下文异常检测的新方法。GESD测试相对准确和高效,因为它执行统计假设测试,但无法处理流时间序列数据。因此,针对流式时间序列数据,我们提出了一种在线版本的测试,能够检测变化模式下的离群值。我们使用Yahoo Webscope的模拟数据,语法数据和实际在线流量数据进行了广泛的实验,显示了该方法的明显优势,尤其是在分析具有不同模式的流数据时。

更新日期:2021-01-31
down
wechat
bug