Strongly Local Hypergraph Diffusions for Clustering and Semi-supervised Learning,arXiv - CS - Social and Information Networks

当前位置： X-MOL 学术 › arXiv.cs.SI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Strongly Local Hypergraph Diffusions for Clustering and Semi-supervised Learning
arXiv - CS - Social and Information Networks Pub Date : 2020-11-16 , DOI: arxiv-2011.07752
Meng Liu, Nate Veldt, Haoyu Song, Pan Li and David F. Gleich

Hypergraph-based machine learning methods are now widely recognized as important for modeling and using higher-order and multiway relationships between data objects. Local hypergraph clustering and semi-supervised learning specifically involve finding a well-connected set of nodes near a given set of labeled vertices. Although many methods for local clustering exist for graphs, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. To tackle these issues, this paper proposes a new diffusion-based hypergraph clustering algorithm that solves a quadratic hypergraph cut based objective akin to a hypergraph analog of Andersen-Chung-Lang personalized PageRank clustering for graphs. We prove that, for graphs with fixed maximum hyperedge size, this method is strongly local, meaning that its runtime only depends on the size of the output instead of the size of the hypergraph and is highly scalable. Moreover, our method enables us to compute with a wide variety of cardinality-based hypergraph cut functions. We also prove that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee. We demonstrate that on large real-world hypergraphs our new method finds better clusters and runs much faster than existing approaches. Specifically, it runs in few seconds for hypergraphs with a few million hyperedges compared with minutes for flow-based technique. We furthermore show that our framework is general enough that can also be used to solve other p-norm based cut objectives on hypergraphs. Our code is available \url{github.com/MengLiuPurdue/LHQD}.

中文翻译：

用于聚类和半监督学习的强局部超图扩散

基于超图的机器学习方法现在被广泛认为对于建模和使用数据对象之间的高阶和多路关系很重要。局部超图聚类和半监督学习特别涉及在给定的一组标记顶点附近找到一组连接良好的节点。尽管图有很多局部聚类的方法，但在超图中局部聚类的方法相对较少。此外，现有的那些通常缺乏对一般类别的超图切割函数进行建模的灵活性，或者无法扩展到大型问题。为了解决这些问题，本文提出了一种新的基于扩散的超图聚类算法，该算法解决了基于二次超图切割的目标，类似于 Andersen-Chung-Lang 个性化 PageRank 聚类图的超图模拟。我们证明，对于固定最大超边尺寸的图，这种方法是强局部的，这意味着它的运行时间只取决于输出的尺寸而不是超图的尺寸，并且具有高度的可扩展性。此外，我们的方法使我们能够使用各种基于基数的超图切割函数进行计算。我们还证明了通过求解新目标函数找到的集群满足 Cheeger-like 质量保证。我们证明，在大型真实世界的超图上，我们的新方法可以找到更好的集群，并且运行速度比现有方法快得多。具体来说，与基于流的技术相比，对于具有几百万个超边的超图，它在几秒钟内运行。我们进一步表明，我们的框架足够通用，也可以用于解决其他基于 p 范数的超图切割目标。

更新日期：2020-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文