当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online dependence clustering of multivariate streaming data using one-class SVMs
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-10-13 , DOI: 10.1002/int.22716
Geonseok Lee 1 , Kichun Lee 1
Affiliation  

Online clustering of multivariate streaming data has attracted considerable interest in recent years due to the abundance of data sources. Numerous studies in this field have been performed, but they usually suffer from the practical problems associated with discovering arbitrary-shaped clusters, specifying major parameters in advance, and detecting aberrant observations. Addressing these issues is important for online-clustering tasks, where data arrive in continuous streams and group behaviors change simultaneously. In this paper, we propose a kernel-based online dependence clustering, namely, KODC, that not only estimates the cluster membership using one-class support vector machines (OC-SVMs), but also detects outliers distant from the identified clusters by aggregating OC-SVM decisions in a realtime basis. At the base level, we use a new measure of connective dependence that forms the graph connected via modified Markovian transitions to enable large-scale clustering. The proposed framework introduces the coherence threshold to extract data points, which can represent a cluster to which they belong, thus controlling the computational complexity without degrading the clustering performance. To track the pattern evolution over time, KODC also updates the classifier configuration maximizing the total group connective dependence. We evaluate this framework on both several synthetic and real-world data sets involving multivariate streaming data, and compare it experimentally with other popular online-clustering methods in terms of four evaluation metrics. The results show that our framework effectively identifies the clusters and outliers, especially in various shaped data subject to change over time, without requiring any prior knowledge of the data.

中文翻译:

使用一类支持向量机的多变量流数据在线依赖聚类

近年来,由于数据源的丰富,多元流数据的在线聚类引起了人们的极大兴趣。该领域已经进行了大量研究,但它们通常遇到与发现任意形状的集群、提前指定主要参数和检测异常观测相关的实际问题。解决这些问题对于在线集群任务很重要,其中数据以连续流的形式到达并且组行为同时发生变化。在本文中,我们提出了一种基于内核的在线依赖聚类,即 KODC,这不仅使用一类支持向量机 (OC-SVM) 估计集群成员,而且还通过实时聚合 OC-SVM 决策来检测远离已识别集群的异常值。在基础级别,我们使用一种新的连接依赖度量,该度量通过修改后的马尔可夫转换形成连接图,以实现大规模聚类。所提出的框架引入了相干阈值来提取数据点,这些数据点可以表示它们所属的集群,从而在不降低聚类性能的情况下控制计算复杂度。为了跟踪模式随时间的演变,KODC还更新分类器配置以最大化总组连接依赖。我们在涉及多元流数据的几个合成和真实世界数据集上评估该框架,并在四个评估指标方面将其与其他流行的在线聚类方法进行实验比较。结果表明,我们的框架有效地识别了集群和异常值,特别是在随时间变化的各种形状的数据中,而不需要任何数据的先验知识。
更新日期:2021-10-13
down
wechat
bug