当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient subspace search in data streams
Information Systems ( IF 3.7 ) Pub Date : 2020-12-17 , DOI: 10.1016/j.is.2020.101705
Edouard Fouché , Florian Kalinke , Klemens Böhm

In the real world, data streams are ubiquitous — think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the specifics of the streaming setting. For static data, a common approach to deal with high dimensionality – known as subspace search – extracts low-dimensional, ‘interesting’ projections (subspaces), in which patterns are easier to find. In this paper, we address both Challenge (1) and (2) by generalising subspace search to data streams. Our approach, Streaming Greedy Maximum Random Deviation (SGMRD), monitors interesting subspaces in high-dimensional data streams. It leverages novel multivariate dependency estimators and monitoring techniques based on bandit theory. We show that the benefits of SGMRD are twofold: (i) It monitors subspaces efficiently, and (ii) this improves the results of downstream data mining tasks, such as outlier detection. Our experiments, performed against synthetic and real-world data, demonstrate that SGMRD outperforms its competitors by a large margin.



中文翻译:

数据流中的高效子空间搜索

在现实世界中,数据流无处不在-考虑网络流量或传感器数据。这些数据的挖掘模式(例如异常值或集群)必须实时进行。这具有挑战性,因为(1)流通常具有高维度,并且(2)数据特征可能会随时间变化。现有方法倾向于仅集中在一个方面,即高维度或流设置的细节。对于静态数据,一种处理高维的常用方法(称为子空间搜索)提取低维,“有趣”的投影(子空间),在其中更容易找到模式。在本文中,我们通过将子空间搜索推广到数据流来应对挑战(1)和(2)。我们的方法流式贪婪最大随机偏差(SGMRD),监视高维数据流中有趣的子空间。它利用了基于强盗理论的新颖多元依赖估计器和监视技术。我们证明SGMRD的好处是双重的:(i)它可以有效地监视子空间,并且(ii)可以改善下游数据挖掘任务(例如异常值检测)的结果。我们针对合成数据和真实数据进行的实验表明,SGMRD在很大程度上优于竞争对手。

更新日期:2020-12-30
down
wechat
bug