当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiple clusterings of heterogeneous information networks
Machine Learning ( IF 7.5 ) Pub Date : 2021-06-02 , DOI: 10.1007/s10994-021-06000-y
Shaowei Wei , Guoxian Yu , Jun Wang , Carlotta Domeniconi , Xiangliang Zhang

Traditional clustering algorithms focus on a single clustering result; as such, they cannot explore potential diverse patterns of complex real world data. To deal with this problem, approaches that exploit meaningful alternative clusterings in data have been developed in recent years. Existing algorithms, including single view/multi-view multiple clustering methods, are designed for applications with i.i.d. data samples, and cannot handle the data samples with dependency presented in networks, especially in heterogeneous information networks (HIN). In this paper, we propose a framework (NetMCs) that can explore multiple clusterings in HIN. Specifically, NetMCs adopts a set of meta-path schemes with different semantics on HIN, and considers each meta-path scheme as a base clustering aspect. Guided by the meta-path schemes, NetMCs then introduces a variation of the skip-gram framework that can jointly optimize multiple clustering aspects, and simultaneously obtain the respective embedding representations and individual clusterings therein. To reduce redundancy between alternative clusterings, NetMCs utilizes an explicit regularization term to control the embedding diversity of the same nodes among different clustering aspects. Experiments on benchmark HIN datasets confirm the performance of NetMCs in generating multiple clusterings with high quality and diversity.



中文翻译:

异构信息网络的多重聚类

传统聚类算法关注单个聚类结果;因此,他们无法探索复杂的现实世界数据的潜在多样模式。为了解决这个问题,近年来开发了利用数据中有意义的替代聚类的方法。现有的算法,包括单视图/多视图多聚类方法,是为具有iid数据样本的应用程序设计的,无法处理网络中存在的具有依赖性的数据样本,尤其是在异构信息网络(HIN)中。在本文中,我们提出了一个框架 (NetMCs),可以探索 HIN 中的多个聚类。具体来说,NetMCs 在 HIN 上采用了一组具有不同语义的元路径方案,并将每个元路径方案视为一个基本的聚类方面。在元路径方案的指导下,然后,NetMCs 引入了 skip-gram 框架的变体,可以联合优化多个聚类方面,并同时获得各自的嵌入表示和其中的单个聚类。为了减少替代聚类之间的冗余,NetMCs 利用显式正则化项来控制不同聚类方面相同节点的嵌入多样性。在基准 HIN 数据集上的实验证实了 NetMC 在生成具有高质量和多样性的多个聚类方面的性能。NetMCs 利用显式正则化项来控制相同节点在不同聚类方面的嵌入多样性。在基准 HIN 数据集上的实验证实了 NetMC 在生成具有高质量和多样性的多个聚类方面的性能。NetMCs 利用显式正则化项来控制相同节点在不同聚类方面的嵌入多样性。在基准 HIN 数据集上的实验证实了 NetMC 在生成具有高质量和多样性的多个聚类方面的性能。

更新日期:2021-06-03
down
wechat
bug