Hierarchical Community Detection by Recursive Partitioning,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hierarchical Community Detection by Recursive Partitioning
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2020-11-24 , DOI: 10.1080/01621459.2020.1833888
Tianxi Li ₁ , Lihua Lei ₂ , Sharmodeep Bhattacharyya ₃ , Koen Van den Berge _{4,

5} , Purnamrita Sarkar ₆ , Peter J. Bickel ₄ , Elizaveta Levina ₇

Affiliation

Abstract

The problem of community detection in networks is usually formulated as finding a single partition of the network into some “correct” number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stopping rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm’s theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. Supplementary materials for this article are available online.

中文翻译：

递归分区的分层社区检测

摘要

网络中的社区检测问题通常被表述为将网络的单个分区划分为一些“正确”数量的社区。我们认为，构建社区的分层树更容易解释，并且在某些制度下更准确。这可以通过简单的自上而下的递归分区算法来完成，从单个社区开始，通过重复谱聚类将节点分成两个社区，直到停止规则表明没有其他社区。这类算法是无模型的、计算效率高的，并且除了选择停止规则之外不需要调整。我们表明，在某些情况下，这种方法的表现优于K路谱聚类，并提出了一个分析算法理论性能的自然框架，即二叉树随机块模型。在这个模型下，我们证明了算法在相对温和的假设下正确地恢复了整个社区树。我们将该算法应用于基于 1580 篇贫血研究论文中基因共现的基因网络，并在有意义的层次结构中识别出六个基因簇。我们还在统计论文的数据集上说明了该算法。本文的补充材料可在线获取。

更新日期：2020-11-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>