Elsevier

Knowledge-Based Systems

Volume 226, 17 August 2021, 106613
Knowledge-Based Systems

Localization of multiple diffusion sources based on overlapping community detection

https://doi.org/10.1016/j.knosys.2020.106613Get rights and content

Abstract

Localization of multiple diffusion sources is of great importance to various practical applications. However, accurately estimating the number of sources, which usually serves as the first step in this field, is still a challenging task. Most existing methods ignore the overlapping characteristic of the infected graph and thus hard to identify the appropriate infected partition for each diffusion source. Furthermore, these methods fail to fully utilize the contagion neighborhood of source candidates, resulting in poor accuracy of source localization. To overcome these problems, we propose a novel multiple sources detection method that utilizes both overlapping community detection and contagion neighborhood bias. In order to estimate the number of sources, the proposed method first utilizes the inherent peak–valley structure of the topological potential field to determine the number of sources. Then, it divides the infected graph into overlapping communities based on node position analysis in the topological potential field, which provides better partitions for source localization. Finally, it locates the single source in each partition based on likelihood estimation and contagion neighborhood bias which takes into account both infected and uninfected nodes in one’s neighborhood. In the experiment part, we evaluate our method on both real-world networks and synthetic networks with various scales and structural features. The results show that our proposed method not only estimates the number of sources more accurately but also locates each source more precisely, outperforming the existing state-of-the-art methods. In addition, our method gives stable performances on different kinds of synthetic networks, exhibiting good robustness.

Introduction

Diffusion and propagation processes are ubiquitous in nature and society [1], such as infectious diseases spreading in human society [2], rumor propagation in social networks [3] and computer virus diffusion on the Internet [4]. Identifying those diffusion sources is very significant to various practical applications [5]. For example, finding the sources of an epidemic can determine the epidemiology of the disease, detecting the sources of viruses can control or even eliminate baleful effects, and identifying the sources of a rumor can provide meaningful information for public opinion intervention. Although extensive efforts have been made, diffusion sources location is still an extremely challenging task because of the randomness of diffusion processes as well as the complexity of network structures [6], [7].

Given a complex network, information originates from one or several sources. During the information diffusion process, the rest vertexes could either receive the message and become infected nodes, or never heard the message and thus tagged as uninfected vertexes. Diffusion sources localization aims to locate these source(s) based on the observation that which vertexes are infected and which are not.

Based on the assumption that rumor originates from a single source, researchers have put forward various methods. The most commonly used approach is centrality measure, such as distance center [8], Jordan center [9], rumor center [10] and unbiased betweenness [11], etc. Some other strategies are also adopted to locate the single source, including maximum-likelihood estimation [12], belief propagation [13], dynamic information passing [14], inverse spreading [15] and interactive query [16], etc. More recently, Choi et al. [17] proposed an anti-rumor based approach that injects hidden monitors to send the “anti-rumor” messages in the network for source identification.

However, there is usually more than one diffusion source under real-world scenarios, leading that identifying multiple sources a more significant and practical task than that of a single source [18]. To date, there have been efforts to handle the problem of multiple sources localization [19]. These methods can be divided into three categories: ranking based methods [20], approximation based methods [21] and network partitioning based methods [22]. More recently, Dong et al. [23] and Wang et al. [24] devote to locating multiple sources without prior knowledge of the underlying propagation model.

As well known, most networks exhibit a natural community structure, i.e. sparse edges between different groups and dense edges within the same group [25]. According to the mechanism of information dissemination, messages tend to propagate within the local community and thus the infected graph of multiple sources presents an inherent community structure. Based on the above, Shelke and Attar [19] point out that network partitioning is a good strategy for locating multiple sources. However, there are still some challenges: (1) The number of sources is crucial to the problem and usually unavailable. In fact, source number estimation is far from being well solved. For example, Zang et al. [18] propose a modularity based heuristic algorithm to estimate the number of sources, but their algorithm needs an input parameter which difficult to set appropriately in advance. (2) To the best of our knowledge, none of the existing methods considers the overlapping characteristic when partitioning the infected graph. Due to the small world phenomenon, the infected areas produced by different sources will inevitably overlap with each other. In other words, some nodes may be infected by more than one source. The existence of these overlapping nodes will complicate the infected graph partition and, if not considered, degrades the performance of multiple sources localization. (3) Most existing source detection methods only carry out the likelihood estimation based on general infected nodes, failing to fully utilize the contagion neighborhood of source candidates. Actually, the contagion situation (which nodes are infected and which are not) around a source candidate is a beneficial supplement to the likelihood estimation and can provide meaningful information from another perspective for diffusion source localization.

Here we present a novel approach to divide the infected graph by capturing the information diffusion dynamics via the topological potential field. Our insight on the analogy between the information diffusion dynamics and the topological potential field is as follows. In physics, the concept of the topological potential field was proposed to describe a non-contact interaction between material particles and each particle has potential energy. With the development of the classical field theory, it has become a mathematical model describing the non-contact interactions between objects. Since the nodes of the infected graph are not isolated but interact with each other by edges, this model can be used to describe the interaction and the association among these nodes. With this model, each node possesses a potential (similar to the potential energy of particles in physics). Nodes with local maximum potential are viewed as peaks of the field and nodes with local minimal potential are regarded as valleys of the field. The whole field presents an inherent peak–valley structure. Since the infected graph is derived from the information diffusion process which is a kind of interaction between nodes. Thus, a local high potential area in the topological potential field could correspond to the infected area propagated by a single source on the basis of that each source tends to propagate within its local area. If we partition the whole infected graph into a series of local high potential areas, the multiple sources locating problem can be transformed into several single sources locating problems.

In this paper, we propose a novel multiple sources localization method to handle the above three challenges simultaneously. The proposed method transforms multi-sources locating problem into several single source locating problems by applying a topological potential field [26] based overlapping community detection method. To the best of our knowledge, our work is the first to give insight on the analogy between the information diffusion dynamics and the topological potential field, and apply this model to the multiple source localization problem. The main contributions of this paper are summarized as follows:

  • Our proposed approach applies topological potential field to capture the information diffusion dynamics for multiple source localization problem. By utilizing the inherent peak–valley structure of the topological potential field, the proposed method estimates the number of sources according to the number of local high potential areas. Based on the nodes’ positions in the topological potential field, the proposed method divides the infected graph into overlapping partitions to capture the infected area of each diffusion source.

  • Our proposed approach combines the likelihood estimation and the contagion neighborhood bias to locate the single diffusion source in each infected partition. The contagion neighborhood includes both infected and uninfected nodes in one’s neighborhood, which contains richer information than only including infected nodes. Thus, the bias could improve the likelihood estimation of the source node according to the number relationship between infected nodes and uninfected nodes.

We evaluate our proposed method from the perspective of source number estimation, infected graph partition, and source localization. Experimental results on synthetic and real-world networks demonstrate that our proposed method can not only accurately estimate the number of diffusion sources, but also locate the these sources more precisely.

The remainder of this paper is organized as follows. Section 2 reviews the related works. Section 3 introduces the topological potential field. Section 4 describes the proposed method in detail. Section 5 discusses the experimental results. Finally, Section 6 provides the conclusion of this paper.

Section snippets

Related works

Our research concentrates on multiple sources localization, therefore, we only review the studies of this line in this section. The existing methods can be divided into three categories, including ranking based methods, approximation based methods and network partitioning based methods.

Topological potential field

A social network G can be denoted as G=(V,E), where V represents the nodes in G and E represents the edges that link these nodes. An infected graph GI=(VI,EI) (I stands for Infected) refers to G’s subgraph that consists of infected nodes (nodes have been activated in a diffusion process) and edges linking them at the time of observation. Thus, VIV and EI={(vi,vj)|vi,vjVI,(vi,vj)E}. In many research fields, the topological potential field is used as a mathematical model to describe the

Method

In this section, we propose a novel method to locate multiple diffusion sources. The proposed method mainly contains three steps. Firstly, it estimates the number of diffusion sources that may exist in the infected graph. Secondly, based on the number of sources, it divides the infected graph into the corresponding number of overlapping partitions. Finally, it locates the single diffusion source individually in each divided partition. The infected graph is derived from the information diffusion

Experimental results

In this section, our proposed multiple sources detectionmethod is dubbed as TP (Topological Potential). The performance of TP will be evaluated from the perspective of source number estimation, infected graph partition, and source localization.

Conclusion

Multiple diffusion sources localization has attracted considerable research effort recently. Most existing methods ignore the overlapping characteristic of the infected graph and fail to fully utilize the contagion neighborhood of source candidates. Furthermore, source number estimation is also far from being well solved. To deal with these problems, this paper proposes a novel multiple sources detection method based on overlapping community detection. The proposed method applies the

CRediT authorship contribution statement

Zhixiao Wang: Conceptualization, Writing - original draft, Funding acquisition. Chengcheng Sun: Methodology, Software, Visualization. Xiaobin Rui: Validation, Investigation, Writing - review & editing. Philip S. Yu: Supervision, Resources, Project administration. Lichao Sun: Formal analysis, Data curation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61876186) and the Fundamental Research Funds for the Central Universities of China (No.2019XKQYMS85).

References (42)

  • Antulov-FantulinN. et al.

    Identification of patient zero in static and temporal networks: Robustness and limitations

    Phys. Rev. Lett.

    (2015)
  • ZhangJ. et al.

    Rumor initiator detection in infected signed networks

  • HuangC. et al.

    A survey on algorithms for epidemic source identification on complex networks

    Chinese J. Comput.

    (2018)
  • LimS. et al.

    Approximating the k-minimum distance rumor source detection in online social networks

  • ZhuK. et al.

    Information source detection in the sir model: A sample-path-based approach

    IEEE/ACM Trans. Netw.

    (2014)
  • D. Shah, T. Zaman, Rumor centrality: a universal source detector, in: Proceedings of the 12th ACM...
  • CominC.H. et al.

    Identifying the starting point of a spreading process in complex networks

    Phys. Rev. E

    (2011)
  • AltarelliF. et al.

    Bayesian inference of epidemics on networks via belief propagation

    Phys. Rev. Lett.

    (2014)
  • LokhovA.Y. et al.

    Inferring the origin of an epidemic with a dynamic message-passing algorithm

    Phys. Rev. E

    (2014)
  • ShenZ. et al.

    Locating the source of diffusion in complex networks by time-reversal backward spreading

    Phys. Rev. E

    (2016)
  • ChoiJ. et al.

    Rumor source detection under querying with untruthful answers

  • Cited by (0)

    View full text