当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CGC
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2016-05-25 , DOI: 10.1145/2903147
Wei Cheng 1 , Zhishan Guo 1 , Xiang Zhang 2 , Wei Wang 3
Affiliation  

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success, existing multi-view graph clustering methods usually assume that different views are available for the same set of instances. Thus, instances in different domains can be treated as having strict one-to-one relationship. In many real-life applications, however, data instances in one domain may correspond to multiple instances in another domain. Moreover, relationships between instances in different domains may be associated with weights based on prior (partial) knowledge. In this article, we propose a flexible and robust framework, Co-regularized Graph Clustering (CGC), based on non-negative matrix factorization (NMF), to tackle these challenges. CGC has several advantages over the existing methods. First, it supports many-to-many cross-domain instance relationship. Second, it incorporates weight on cross-domain relationship. Third, it allows partial cross-domain mapping so that graphs in different domains may have different sizes. Finally, it provides users with the extent to which the cross-domain instance relationship violates the in-domain clustering structure, and thus enables users to re-evaluate the consistency of the relationship. We develop an efficient optimization method that guarantees to find the global optimal solution with a given confidence requirement. The proposed method can automatically identify noisy domains and assign smaller weights to them. This helps to obtain optimal graph partition for the focused domain. Extensive experimental results on UCI benchmark datasets, newsgroup datasets, and biological interaction networks demonstrate the effectiveness of our approach.

中文翻译:

CGC

多视图图聚类旨在通过整合在不同领域收集的异构信息来提高聚类性能。每个域提供数据实例的不同视图。利用跨域信息已被证明是实现更好聚类结果的有效方法。尽管先前取得了成功,但现有的多视图图聚类方法通常假设不同的视图可用于相同的实例集。因此,不同域中的实例可以被视为具有严格的一对一关系。然而,在许多实际应用中,一个域中的数据实例可能对应于另一个域中的多个实例。此外,不同域中的实例之间的关系可能与基于先前(部分)知识的权重相关联。在本文中,我们提出了一个基于非负矩阵分解 (NMF) 的灵活且强大的框架 Co-regularized Graph Clustering (CGC),以应对这些挑战。CGC 与现有方法相比有几个优点。首先,它支持多对多跨域实例关系。其次,它包含了对跨域关系的权重。第三,它允许部分跨域映射,因此不同域中的图可能具有不同的大小。最后,它为用户提供了跨域实例关系违反域内聚类结构的程度,从而使用户能够重新评估关系的一致性。我们开发了一种有效的优化方法,保证找到具有给定置信度要求的全局最优解。所提出的方法可以自动识别噪声域并为其分配较小的权重。这有助于获得关注域的最佳图分区。在 UCI 基准数据集、新闻组数据集、
更新日期:2016-05-25
down
wechat
bug