当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An automatic clustering method using multi-objective genetic algorithm with gene rearrangement and cluster merging
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-11-24 , DOI: 10.1016/j.asoc.2020.106929
Hongchun Qu , Li Yin , Xiaoming Tang

As an unsupervised approach of machine learning, clustering is an important method to understand and learn structural information from data. However, current adaptive clustering approach based on multi-objective genetic algorithm have two apparent limitations. The first is that prior knowledge, i.e., sample information is needed to get the correct cluster number. The second is that no effective method can be found to select the best clustering solution from the Pareto Optimal Front (POF) generated by a multi-objective optimization. These problems become severer in applications applied on non-category datasets. Therefore, the primary goal of this research is to establish a genetic optimization based multi-objective clustering framework, in which multiple clustering validity indexes (CVIs) can be tested simultaneously to automatically obtain the optimal cluster number without knowing any sample label information in advance. In this effort, we will not only be able to consider clustering measurements such as cluster cohesion and separation, but also take other aspects, such as compactness, connectivity, variation among data elements, into consideration as well. Then, we aim to design a procedure to recommend three best solutions from the POF by using appropriate combination of CVIs without increasing computational cost. This procedure is expected to control the cluster number in a reasonable range and consequently decrease the difficulty in best solution recommendation. Finally, since we have the knowledge that using gene rearrangement in the genetic optimization does not affect partition, we take this advantage to merge clusters effectively and significantly speed the convergence of the algorithm. Our approach can outperform the state-of-the-art counterparts across diverse benchmark datasets in terms of partitioning accuracy and performance, as demonstrated in three experiments conducted on both artificial and typical real-world datasets.



中文翻译:

利用多目标遗传算法进行基因重排与聚类的自动聚类方法

作为一种无监督的机器学习方法,聚类是从数据中了解和学习结构信息的重要方法。然而,当前基于多目标遗传算法的自适应聚类方法存在两个明显的局限性。首先是需要先验知识,即样本信息才能获得正确的簇号。第二个是找不到有效的方法从多目标优化生成的帕累托最优阵线(POF)中选择最佳聚类解决方案。在应用于非类别数据集的应用程序中,这些问题变得更加严重。因此,本研究的主要目的是建立基于遗传优化的多目标聚类框架,其中可以同时测试多个聚类有效性指数(CVI),以自动获得最佳聚类编号,而无需事先知道任何样本标签信息。在这项工作中,我们不仅能够考虑群集的度量,例如群集的内聚和分离,而且还可以考虑其他方面,例如紧凑性,连通性,数据元素之间的差异。然后,我们旨在设计一种程序,以通过使用适当的CVI组合从POF中推荐三个最佳解决方案,而不会增加计算成本。预期此过程可将群集数控制在合理的范围内,从而减少最佳解决方案推荐的难度。最后,由于我们知道在遗传优化中使用基因重排不会影响分区,因此我们利用此优势可以有效地合并聚类,并显着加快算法的收敛速度。我们的方法可以在分区准确性和性能方面胜过各种基准数据集上的最新技术,正如在人工数据集和典型的真实数据集上进行的三个实验所证明的那样。

更新日期:2020-11-25
down
wechat
bug