当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online Data Thinning via Multi-Subspace Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 4-20-2018 , DOI: 10.1109/tpami.2018.2829189
Xin J. Hunt , Rebecca Willett

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in data centers. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariance matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. Furthermore, the proposed method allows subsampling, is robust to missing data, and uses a mini-batch online optimization approach. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery and e-mail databases illustrate the efficacy of the proposed approach.

中文翻译:


通过多子空间跟踪进行在线数据细化



在大规模流数据无处不在的时代,数据的可用性远远超出了人类分析专家的能力。在许多情况下,此类数据要么被丢弃,要么未经处理就存储在数据中心中。本文提出了一种在线数据细化方法,其中对大规模流数据集进行筛选,以保留独特的、异常的或显着的元素,以便及时进行专家分析。该方法的核心是基于动态低秩高斯混合模型的在线异常检测方法。具体来说,与高斯分量相关联的高维协方差矩阵与低秩模型相关联。根据这个模型,大多数观测值都位于子空间的并集附近。低秩建模减轻了与高维数据异常检测相关的维数灾难,子空间聚类和子空间跟踪的最新进展使所提出的方法能够适应动态环境。此外,所提出的方法允许二次采样,对丢失数据具有鲁棒性,并使用小批量在线优化方法。由此产生的算法具有可扩展性、高效性并且能够实时运行。对广域运动图像和电子邮件数据库的实验说明了所提出方法的有效性。
更新日期:2024-08-22
down
wechat
bug