Non-Exhaustive, Overlapping Clustering,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Non-Exhaustive, Overlapping Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 8-6-2018 , DOI: 10.1109/tpami.2018.2863278
Joyce Jiyoung Whang , Yangyang Hou , David F. Gleich , Inderjit S. Dhillon

Traditional clustering algorithms, such as K-Means, output a clustering that is disjoint and exhaustive, i.e., every single data point is assigned to exactly one cluster. However, in many real-world datasets, clusters can overlap and there are often outliers that do not belong to any cluster. While this is a well-recognized problem, most existing algorithms address either overlap or outlier detection and do not tackle the problem in a unified way. In this paper, we propose an intuitive objective function, which we call the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective, that captures the issues of overlap and non-exhaustiveness in a unified manner. Our objective function can be viewed as a reformulation of the traditional K-Means objective, with easy-to-understand parameters that capture the degrees of overlap and non-exhaustiveness. By considering an extension to weighted kernel K-Means, we show that we can also apply our NEO-K-Means idea to overlapping community detection, which is an important task in network analysis. To optimize the NEO-K-Means objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using low-rank semidefinite programming techniques. Our experimental results show that the new objective and algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness; for the case of graphs, we show that our method outperforms state-of-the-art overlapping community detection algorithms.

中文翻译：

非详尽的重叠聚类

传统的聚类算法（例如 K-Means）输出不相交且详尽的聚类，即每个数据点都准确地分配给一个聚类。然而，在许多现实世界的数据集中，聚类可能会重叠，并且通常存在不属于任何聚类的异常值。虽然这是一个众所周知的问题，但大多数现有算法要么解决重叠要么异常值检测，并且不能以统一的方式解决该问题。在本文中，我们提出了一种直观的目标函数，称为 NEO-K-Means（非穷举、重叠 K-均值）目标，它以统一的方式捕获重叠和非穷举性问题。我们的目标函数可以被视为传统 K 均值目标的重新表述，具有易于理解的参数来捕获重叠程度和非详尽性。通过考虑对加权内核 K 均值的扩展，我们表明我们还可以将 NEO-K 均值思想应用于重叠社区检测，这是网络分析中的一项重要任务。为了优化 NEO-K-Means 目标，我们不仅开发快速迭代算法，而且还使用低秩半定编程技术开发更复杂的算法。我们的实验结果表明，新的目标和算法可以有效地找到具有不同重叠和非详尽性的真实聚类；对于图的情况，我们表明我们的方法优于最先进的重叠社区检测算法。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11