当前位置: X-MOL 学术Sādhanā › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerated Single Linkage Algorithm using the farthest neighbour principle
Sādhanā ( IF 1.6 ) Pub Date : 2021-02-26 , DOI: 10.1007/s12046-020-01544-6
Payel Banerjee , Amlan Chakrabarti , Tapas Kumar Ballabh

Single Linkage algorithm is a hierarchical clustering method which is most unsuitable for large dataset because of its high convergence time. The paper proposes an efficient accelerated technique for the algorithm for clustering univariate data with a merging threshold. It is a two-stage algorithm with the first one as an incremental pre-clustering step that uses the farthest neighbour principle to partially cluster the database by scanning it only once. The algorithm uses the Segment Addition Postulate as a major tool for accelerating the pre-clustering stage. The incremental approach makes it suitable for partial clustering of streaming data while collecting it. The Second stage merges these pre-clusters to produce the final set of Single Linkage clusters by comparing the biggest and the smallest data of each pre-cluster and thereby converging faster in comparison to those methods where all the members of the clusters are used for a clustering action. The algorithm is also suitable for fast-changing dynamic databases as it can cluster a newly added data without using all the data of the database. Experiments are conducted with various datasets and the result confirms that the proposed algorithm outperforms its well-known variants.



中文翻译:

使用最远邻居原理的加速单链接算法

单链接算法是一种分层聚类方法,由于其收敛时间长,因此最不适合大型数据集。本文提出了一种有效的加速技术,用于对具有合并阈值的单变量数据进行聚类。它是一个两阶段算法,第一个算法是增量预聚簇步骤,它使用最远的邻居原理通过仅扫描一次数据库来部分集群数据库。该算法使用“分段加法假设”作为加速预聚类阶段的主要工具。增量方法使其适用于在收集数据时对流数据进行部分群集。第二阶段通过比较每个预集群的最大和最小数据,合并这些前集群以生成最终的单链接集群集,从而与那些使用集群中所有成员进行分组的方法相比,收敛更快。聚类动作。该算法还适用于快速变化的动态数据库,因为它可以对新添加的数据进行聚类,而无需使用数据库的所有数据。对各种数据集进行了实验,结果证实了所提出的算法优于其众所周知的变体。

更新日期:2021-02-26
down
wechat
bug