Abstract
Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help solve network inference problems. Indeed, several approaches have been proposed to introduce metadata into probabilistic network models and to use them to make better inferences. However, we know little about the effect of such metadata in the inference process. Here, we investigate this issue. We find that, rather than affecting inference gradually, adding metadata causes a crossover in the inference process and in our ability to make accurate predictions, from a situation in which metadata do not play any role to a situation in which metadata completely dominate the inference process. When network data and metadata are partly correlated, metadata optimally contributes to the inference process at the crossover between data-dominated and metadata-dominated regimes.
- Received 8 April 2021
- Accepted 16 November 2021
DOI:https://doi.org/10.1103/PhysRevX.12.011010
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Popular Summary
Predicting whether two drugs have a harmful interaction or whether someone is going to like a certain movie are examples of network inference problems. In these problems, the goal is to predict new interactions (between drugs or between people and movies) based on some previously observed interactions. Having additional information about the network nodes or their metadata (for example, the mechanism of action of the drugs or the age of the individuals) helps to make better predictions, though it is not clear why or how. Here, we explore how that improvement happens.
We study a very general network inference problem and show that node metadata do not affect the inference problem gradually. Rather, even when the importance assigned to the metadata increases smoothly, the inference process crosses over from a data-dominated regime to a metadata-dominated regime. These crossovers show some similarities to transitions driven by temperature, where one finds energy- and entropy-dominated regimes. Importantly, optimal inference is often encountered exactly at this crossover.
Our study opens the door to better understanding the role of metadata in network inference problems and, more broadly, establishes further connections between general inference problems and physical concepts such as phase transitions.