当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2021-05-07 , DOI: 10.1109/tcyb.2021.3049633
Dong Huang 1 , Chang-Dong Wang 2 , Jian-Huang Lai 2 , Chee-Keong Kwoh 3
Affiliation  

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multilevel diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this article proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can be thereby constructed. Furthermore, an entropy-based criterion is utilized to explore the cluster wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state of the art. The source code is available at https://github.com/huangdonghere/MDEC.

中文翻译:


走向高维数据的多元集成聚类:从子空间到度量及其他



高维数据在各个领域的快速出现给当前集成聚类研究带来了新的挑战。为了解决维数灾难,最近通过不同的基于子空间的技术在集成聚类方面做出了相当大的努力。然而,除了强调子空间之外,对相似性/相异性度量的潜在多样性的关注相当有限。在集成聚类中,如何创建和聚合大量多样化指标,以及如何在统一框架中联合研究大量指标、子空间和聚类的多层次多样性,仍然是一个令人惊讶的开放问题。为了解决这个问题,本文提出了一种新颖的多元集成聚类方法。特别是,我们通过随机化缩放的指数相似性内核来创建大量多样化的度量,然后将其与随机子空间耦合以形成大量度量子空间对。基于从这些度量子空间对导出的相似性矩阵,可以构建多样化基础聚类的集合。此外,利用基于熵的标准来探索集成中的聚类多样性,在此基础上通过结合三种类型的共识函数提出了三种特定的集成聚类算法。在 30 个高维数据集上进行了大量实验,包括 18 个癌症基因表达数据集和 12 个图像/语音数据集,这证明了我们的算法相对于现有技术的优越性。源代码可在 https://github.com/huangdonghere/MDEC 获取。
更新日期:2021-05-07
down
wechat
bug