当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An automatic three-way clustering method based on sample similarity
International Journal of Machine Learning and Cybernetics ( IF 5.6 ) Pub Date : 2021-01-14 , DOI: 10.1007/s13042-020-01255-8
Xiuyi Jia , Ya Rao , Weiwei Li , Sichun Yang , Hong Yu

The three-way clustering is an extension of traditional clustering by adding the concept of fringe region, which can effectively solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data in traditional two-way clustering methods. The existing three-way clustering works often select the appropriate number of clusters and the thresholds for three-way partition according to subjective tuning. However, the method of fixing the number of clusters and the thresholds of the partition cannot automatically select the optimal number of clusters and partition thresholds for different data sets with different sizes and densities. To address the above problem, this paper proposed an improved three-way clustering method. First, we define the roughness degree by introducing the sample similarity to measure the uncertainty of the fringe region. Moreover, based on the roughness degree, we define a novel partitioning validity index to measure the clustering partitions and propose an automatic threshold selection method. Second, based on the concept of sample similarity, we introduce the intra-class similarity and the inter-class similarity to describe the quantitative change of the relationship between the sample and the clusters, and define a novel clustering validity index to measure the clustering performance under different numbers of clusters through the integration of the above two kinds of similarities. Furthermore, we propose an automatic cluster number selection method. Finally, we give an automatic three-way clustering approach by combining the proposed threshold selection method and the cluster number selection method. The comparison experiments demonstrate the effectiveness of our proposal.



中文翻译:

基于样本相似度的自动三向聚类方法

三向聚类是对传统聚类的一种扩展,增加了边缘区域的概念,可以有效解决传统两向聚类方法中信息不准确或数据不足造成的决策不准确的问题。现有的三向聚类工作通常根据主观调整来选择适当数目的聚类和三向划分的阈值。但是,固定簇数和分区阈值的方法无法针对具有不同大小和密度的不同数据集自动选择最佳簇数和分区阈值。针对上述问题,本文提出了一种改进的三向聚类方法。第一,我们通过引入样本相似度来测量边缘区域的不确定性来定义粗糙度。此外,基于粗糙程度,定义了一种新的划分有效性指标来度量聚类划分,并提出了一种自动阈值选择方法。其次,基于样本相似度的概念,引入类内相似度和类间相似度来描述样本与聚类之间关系的定量变化,并定义了一种新的聚类有效性指标来衡量聚类性能通过整合以上两种相似性在不同数量的集群下。此外,我们提出了一种自动选择簇数的方法。最后,通过结合提出的阈值选择方法和聚类数选择方法,我们给出了一种自动三向聚类方法。比较实验证明了我们建议的有效性。

更新日期:2021-01-14
down
wechat
bug