当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 5-10-2018 , DOI: 10.1109/tcyb.2018.2822552
Conor Fahy , Shengxiang Yang , Mario Gongora

A data stream is a continuously arriving sequence of data and clustering data streams requires additional considerations to traditional clustering. A stream is potentially unbounded, data points arrive online and each data point can be examined only once. This imposes limitations on available memory and processing time. Furthermore, streams can be noisy and the number of clusters in the data and their statistical properties can change over time. This paper presents an online, bio-inspired approach to clustering dynamic data streams. The proposed ant colony stream clustering (ACSC) algorithm is a density-based clustering algorithm, whereby clusters are identified as high-density areas of the feature space separated by low-density areas. ACSC identifies clusters as groups of micro-clusters. The tumbling window model is used to read a stream and rough clusters are incrementally formed during a single pass of a window. A stochastic method is employed to find these rough clusters, this is shown to significantly speeding up the algorithm with only a minor cost to performance, as compared to a deterministic approach. The rough clusters are then refined using a method inspired by the observed sorting behavior of ants. Ants pick-up and drop items based on the similarity with the surrounding items. Artificial ants sort clusters by probabilistically picking and dropping microclusters based on local density and local similarity. Clusters are summarized using their constituent micro-clusters and these summary statistics are stored offline. Experimental results show that the clustering quality of ACSC is scalable, robust to noise and favorable to leading ant clustering and stream-clustering algorithms. It also requires fewer parameters and less computational time.

中文翻译:


蚁群流聚类:一种针对动态数据流的快速密度聚类算法



数据流是连续到达的数据序列,对数据流进行聚类需要比传统聚类额外考虑。流可能是无限的,数据点在线到达,并且每个数据点只能检查一次。这对可用内存和处理时间施加了限制。此外,流可能会有噪音,数据中的簇数量及其统计属性可能会随着时间的推移而变化。本文提出了一种在线的、仿生的方法来聚类动态数据流。所提出的蚁群流聚类(ACSC)算法是一种基于密度的聚类算法,其中聚类被识别为特征空间中被低密度区域分隔的高密度区域。 ACSC 将集群识别为微集群组。翻滚窗口模型用于读取流,并且在窗口的单次传递期间增量地形成粗簇。采用随机方法来查找这些粗糙的簇,与确定性方法相比,这可以显着加快算法速度,而性能成本却很小。然后使用受观察到的蚂蚁分类行为启发的方法对粗略的聚类进行细化。蚂蚁根据与周围物品的相似性来拾取和丢弃物品。人工蚂蚁根据局部密度和局部相似性,通过概率性地挑选和丢弃微簇来对簇进行排序。集群使用其组成的微集群进行汇总,并且这些汇总统计数据离线存储。实验结果表明,ACSC的聚类质量具有可扩展性,对噪声具有鲁棒性,有利于领先的蚂蚁聚类和流聚类算法。它还需要更少的参数和更少的计算时间。
更新日期:2024-08-22
down
wechat
bug