Localization of multiple diffusion sources based on overlapping community detection
Introduction
Diffusion and propagation processes are ubiquitous in nature and society [1], such as infectious diseases spreading in human society [2], rumor propagation in social networks [3] and computer virus diffusion on the Internet [4]. Identifying those diffusion sources is very significant to various practical applications [5]. For example, finding the sources of an epidemic can determine the epidemiology of the disease, detecting the sources of viruses can control or even eliminate baleful effects, and identifying the sources of a rumor can provide meaningful information for public opinion intervention. Although extensive efforts have been made, diffusion sources location is still an extremely challenging task because of the randomness of diffusion processes as well as the complexity of network structures [6], [7].
Given a complex network, information originates from one or several sources. During the information diffusion process, the rest vertexes could either receive the message and become infected nodes, or never heard the message and thus tagged as uninfected vertexes. Diffusion sources localization aims to locate these source(s) based on the observation that which vertexes are infected and which are not.
Based on the assumption that rumor originates from a single source, researchers have put forward various methods. The most commonly used approach is centrality measure, such as distance center [8], Jordan center [9], rumor center [10] and unbiased betweenness [11], etc. Some other strategies are also adopted to locate the single source, including maximum-likelihood estimation [12], belief propagation [13], dynamic information passing [14], inverse spreading [15] and interactive query [16], etc. More recently, Choi et al. [17] proposed an anti-rumor based approach that injects hidden monitors to send the “anti-rumor” messages in the network for source identification.
However, there is usually more than one diffusion source under real-world scenarios, leading that identifying multiple sources a more significant and practical task than that of a single source [18]. To date, there have been efforts to handle the problem of multiple sources localization [19]. These methods can be divided into three categories: ranking based methods [20], approximation based methods [21] and network partitioning based methods [22]. More recently, Dong et al. [23] and Wang et al. [24] devote to locating multiple sources without prior knowledge of the underlying propagation model.
As well known, most networks exhibit a natural community structure, i.e. sparse edges between different groups and dense edges within the same group [25]. According to the mechanism of information dissemination, messages tend to propagate within the local community and thus the infected graph of multiple sources presents an inherent community structure. Based on the above, Shelke and Attar [19] point out that network partitioning is a good strategy for locating multiple sources. However, there are still some challenges: (1) The number of sources is crucial to the problem and usually unavailable. In fact, source number estimation is far from being well solved. For example, Zang et al. [18] propose a modularity based heuristic algorithm to estimate the number of sources, but their algorithm needs an input parameter which difficult to set appropriately in advance. (2) To the best of our knowledge, none of the existing methods considers the overlapping characteristic when partitioning the infected graph. Due to the small world phenomenon, the infected areas produced by different sources will inevitably overlap with each other. In other words, some nodes may be infected by more than one source. The existence of these overlapping nodes will complicate the infected graph partition and, if not considered, degrades the performance of multiple sources localization. (3) Most existing source detection methods only carry out the likelihood estimation based on general infected nodes, failing to fully utilize the contagion neighborhood of source candidates. Actually, the contagion situation (which nodes are infected and which are not) around a source candidate is a beneficial supplement to the likelihood estimation and can provide meaningful information from another perspective for diffusion source localization.
Here we present a novel approach to divide the infected graph by capturing the information diffusion dynamics via the topological potential field. Our insight on the analogy between the information diffusion dynamics and the topological potential field is as follows. In physics, the concept of the topological potential field was proposed to describe a non-contact interaction between material particles and each particle has potential energy. With the development of the classical field theory, it has become a mathematical model describing the non-contact interactions between objects. Since the nodes of the infected graph are not isolated but interact with each other by edges, this model can be used to describe the interaction and the association among these nodes. With this model, each node possesses a potential (similar to the potential energy of particles in physics). Nodes with local maximum potential are viewed as peaks of the field and nodes with local minimal potential are regarded as valleys of the field. The whole field presents an inherent peak–valley structure. Since the infected graph is derived from the information diffusion process which is a kind of interaction between nodes. Thus, a local high potential area in the topological potential field could correspond to the infected area propagated by a single source on the basis of that each source tends to propagate within its local area. If we partition the whole infected graph into a series of local high potential areas, the multiple sources locating problem can be transformed into several single sources locating problems.
In this paper, we propose a novel multiple sources localization method to handle the above three challenges simultaneously. The proposed method transforms multi-sources locating problem into several single source locating problems by applying a topological potential field [26] based overlapping community detection method. To the best of our knowledge, our work is the first to give insight on the analogy between the information diffusion dynamics and the topological potential field, and apply this model to the multiple source localization problem. The main contributions of this paper are summarized as follows:
- •
Our proposed approach applies topological potential field to capture the information diffusion dynamics for multiple source localization problem. By utilizing the inherent peak–valley structure of the topological potential field, the proposed method estimates the number of sources according to the number of local high potential areas. Based on the nodes’ positions in the topological potential field, the proposed method divides the infected graph into overlapping partitions to capture the infected area of each diffusion source.
- •
Our proposed approach combines the likelihood estimation and the contagion neighborhood bias to locate the single diffusion source in each infected partition. The contagion neighborhood includes both infected and uninfected nodes in one’s neighborhood, which contains richer information than only including infected nodes. Thus, the bias could improve the likelihood estimation of the source node according to the number relationship between infected nodes and uninfected nodes.
We evaluate our proposed method from the perspective of source number estimation, infected graph partition, and source localization. Experimental results on synthetic and real-world networks demonstrate that our proposed method can not only accurately estimate the number of diffusion sources, but also locate the these sources more precisely.
The remainder of this paper is organized as follows. Section 2 reviews the related works. Section 3 introduces the topological potential field. Section 4 describes the proposed method in detail. Section 5 discusses the experimental results. Finally, Section 6 provides the conclusion of this paper.
Section snippets
Related works
Our research concentrates on multiple sources localization, therefore, we only review the studies of this line in this section. The existing methods can be divided into three categories, including ranking based methods, approximation based methods and network partitioning based methods.
Topological potential field
A social network G can be denoted as , where represents the nodes in and represents the edges that link these nodes. An infected graph ( stands for Infected) refers to ’s subgraph that consists of infected nodes (nodes have been activated in a diffusion process) and edges linking them at the time of observation. Thus, and . In many research fields, the topological potential field is used as a mathematical model to describe the
Method
In this section, we propose a novel method to locate multiple diffusion sources. The proposed method mainly contains three steps. Firstly, it estimates the number of diffusion sources that may exist in the infected graph. Secondly, based on the number of sources, it divides the infected graph into the corresponding number of overlapping partitions. Finally, it locates the single diffusion source individually in each divided partition. The infected graph is derived from the information diffusion
Experimental results
In this section, our proposed multiple sources detectionmethod is dubbed as TP (Topological Potential). The performance of TP will be evaluated from the perspective of source number estimation, infected graph partition, and source localization.
Conclusion
Multiple diffusion sources localization has attracted considerable research effort recently. Most existing methods ignore the overlapping characteristic of the infected graph and fail to fully utilize the contagion neighborhood of source candidates. Furthermore, source number estimation is also far from being well solved. To deal with these problems, this paper proposes a novel multiple sources detection method based on overlapping community detection. The proposed method applies the
CRediT authorship contribution statement
Zhixiao Wang: Conceptualization, Writing - original draft, Funding acquisition. Chengcheng Sun: Methodology, Software, Visualization. Xiaobin Rui: Validation, Investigation, Writing - review & editing. Philip S. Yu: Supervision, Resources, Project administration. Lichao Sun: Formal analysis, Data curation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61876186) and the Fundamental Research Funds for the Central Universities of China (No.2019XKQYMS85).
References (42)
- et al.
Stochastic approximation algorithms for rumor source inference on graphs
Perform. Eval.
(2019) An universal algorithm for source location in complex networks
Physica A
(2019)- et al.
Identifying the diffusion source in complex networks with limited observers
Physica A
(2019) - et al.
Multiple propagation paths enhance locating the source of diffusion in complex networks
Physica A
(2019) - et al.
Locating multiple sources in social networks under the sir model: A divide-and-conquer approach
J. Comput. Sci.
(2015) - et al.
Source detection of rumor in social network–a review
Online Soc. Netw. Media
(2019) - et al.
Discovering multiple diffusion source nodes in social networks
Procedia Comput. Sci.
(2014) - et al.
Locating multiple diffusion sources in time varying networks from sparse observations
Sci. Rep.
(2018) - et al.
Tracking the evolution of overlapping communities in dynamic social networks
Knowl.-Based Syst.
(2018) - et al.
Identifying propagation sources in networks: State-of-the-art and comparative studies
IEEE Commun. Surv. Tutor.
(2016)