当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fuzzy Clustering of Single-View Incomplete Data Using a Multiview Framework
IEEE Transactions on Fuzzy Systems ( IF 10.7 ) Pub Date : 5-10-2022 , DOI: 10.1109/tfuzz.2022.3173673
Suvra Jyoti Choudhury 1 , Nikhil R Pal 2
Affiliation  

We propose four frameworks for clustering data with missing values. We first use a very simple method to impute the missing values and generate multiple imputed versions of the data. These views are then clustered together to obtain a common partition matrix and a common set of centroids. As the clustering framework, we use a multiview version of the Fuzzy-c-Means (MVFCM) and a multiview version of Kernelized Fuzzy-c-Means (MVKFCM). To find the importance (weights) of different views, we use an entropic regularization term using the weights. After obtaining the optimal weights, the final imputation is done as a weighted sum (convex combination) of the imputed values used to generate the views. The final clustering is done on this imputed data set. We compare the performance of the proposed algorithms with several algorithms using Normalized Mutual Information, Adjusted Rand Index, and cluster accuracy on 12 benchmark data sets. Of these algorithms, the MVKFCM is found to perform the best. The MVFCM and MVKFCM use 5×r5 \times r views, where rr is the number of classes (note that the class labels are not used). However, rr may not be known, and also for large rr, there will be too many views. So we propose two variants of MVKFCM: MVKFCMFVMVKFCM_{FV} and MVKFCMRFVMVKFCM_{RFV} (FV stands for a fixed number of views and RFV stands for a robust version with fixed views). The MVKFCMRFVMVKFCM_{RFV} generates views in a manner that helps to obtain robust performance. As expected, MVKFCMRFVMVKFCM_{RFV} is found to be the best performing algorithm.

中文翻译:


使用多视图框架对单视图不完整数据进行模糊聚类



我们提出了四种框架来对缺失值的数据进行聚类。我们首先使用一种非常简单的方法来估算缺失值并生成数据的多个估算版本。然后将这些视图聚集在一起以获得公共分区矩阵和公共质心集。作为聚类框架,我们使用 Fuzzy-c-Means (MVFCM) 的多视图版本和 Kernelized Fuzzy-c-Means (MVKFCM) 的多视图版本。为了找到不同视图的重要性(权重),我们使用权重的熵正则化项。获得最佳权重后,最终插补作为用于生成视图的插补值的加权和(凸组合)完成。最终的聚类是在此估算数据集上完成的。我们在 12 个基准数据集上使用归一化互信息、调整兰德指数和聚类精度将所提出的算法的性能与几种算法进行了比较。在这些算法中,MVKFCM 的性能最好。 MVFCM 和 MVKFCM 使用 5×r5 \times r 视图,其中 rr 是类的数量(请注意,未使用类标签)。但是rr可能不知道,而且对于大的rr,也会有太多的视图。因此,我们提出了MVKFCM的两个变体:MVKFCMFVMVKFCM_{FV}和MVKFCMRFVMVKFCM_{RFV}(FV代表固定数量的视图,RFV代表具有固定视图的鲁棒版本)。 MVKFCMRFVMVKFCM_{RFV} 以有助于获得稳健性能的方式生成视图。正如预期的那样,MVKFCMRFVMVKFCM_{RFV} 被发现是性能最佳的算法。
更新日期:2024-08-26
down
wechat
bug