Abstract
Community detection methods attempt to divide a network into groups of nodes that share similar properties, thus revealing its large-scale structure. A major challenge when employing such methods is that they are often degenerate, typically yielding a complex landscape of competing answers. As an attempt to extract understanding from a population of alternative solutions, many methods exist to establish a consensus among them in the form of a single partition “point estimate” that summarizes the whole distribution. Here, we show that it is, in general, not possible to obtain a consistent answer from such point estimates when the underlying distribution is too heterogeneous. As an alternative, we provide a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus but also the dissensus between elements of the population. Our approach is able to model mixed populations of partitions, where multiple consensuses can coexist, representing different competing hypotheses for the network structure. We also show how our methods can be used to compare pairs of partitions, how they can be generalized to hierarchical divisions, and how they can be used to perform statistical model selection between competing hypotheses.
9 More- Received 23 June 2020
- Revised 22 December 2020
- Accepted 12 January 2021
DOI:https://doi.org/10.1103/PhysRevX.11.021003
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Popular Summary
Many network systems, such as the Internet, online social networks, and protein interactions, are so large and complex that to obtain insight into their function, researchers require a way to provide a summary of the large-scale network structure. This is the role of network-clustering algorithms. However, it is often the case that such large networks admit a large number of competing summaries, each focusing on a distinct aspect of their structure. Here, we provide a systematic methodology to fully characterize the distribution of solutions given by network-clustering algorithms.
The emergence of many competing summaries leads to an additional level of complexity that is not sufficiently tackled by state-of-the-art network-clustering algorithms, which tend to focus on a single summary or a single consensus among several. Our method is capable of revealing not only the consensus, or the extent to which the summaries agree, but also the dissensus, the extent to which they disagree. This is achieved by a clustering of the summaries themselves, yielding groups of results that point in a cohesive direction.
Our methodology allows us to extract more meaningful and accurate results from network-clustering algorithms as well as weight the distinct summaries according to their plausibility.