Skip to main content
Log in

Dynamic top-k influence maximization in social networks

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The problem of top-k influence maximization is to find the set of k users in a social network that can maximize the spread of influence under certain influence propagation model. This paper studies the influence maximization problem together with network dynamics. For example, given a real-life social network that evolves over time, we want to find k most influential users on everyday basis. This dynamic influence maximization problem has wide applications in practice. However, to our best knowledge, there is little prior work that studies this problem. Applying existing influence maximization algorithms at every time step provides a straightforward solution to the dynamic top-k influence maximization problem. Such a solution is, however, inefficient as it completely ignores the smoothness of network change. By analyzing two real social networks, Brightkite and Gowalla, we observe that the top-k influential set, as well as its influence value, does not change dramatically over time. Hence, it is possible to find the new top-k influential set by updating the previous one. We propose an efficient incremental update framework that takes advantage of such smoothness of network change. The proposed method achieves the same approximation ratio of 1 − e− 1 as its state-of-the-art static counterparts. Our experiments show that the proposed method outperforms the straightforward solution by a wide margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.brandwatch.com/blog/react-50-most-influential-people-on-twitter-2019/

  2. https://twitter.com/

  3. To be exact, I(Φ1) = 2.8224942 and I(Φ2) = 4.117388.

  4. Given a directed graph G = (V,E), the reachability of a vertex vV is the number of vertices that are reachable from v (inclusive of v).

  5. SNAP: http://snap.stanford.edu/index.html

  6. We are aware of the logical fallacy of post hoc ergo propter hoc. Nonetheless, we believe that the simple assumption here reflects certain aspects of the reality. After all, deliberate design of more sophisticated models will be a detraction from our main focus in this paper.

  7. We put Eδ in the notation \(G_{\delta }=\left (V_{s},E_{s},E_{\delta }\right )\) to emphasize the fact that Gs “grows out of” Eδ, and Eδ is of essential importance in Gs, as we will see shortly in Lemmata 1-2.

  8. Note that \(d^{\prime }_{v}\equiv {\Delta } I^{\prime }({\varPhi }_{i-1})\) for any v. In our implementation, we override the function by passing only one value instead of a vector.

  9. We remove the experiments of varying k due to limitations of space and analogous performance to other sensitivity experiments.

  10. http://www.arXiv.org

  11. Due to the limitation of space, we remove the experiments of varying this parameter p. The results are very similar to what we present in this section.

References

  1. Bao H, Chang EC (2010) Adheat: an influence-based diffusion model for propagating hints to match ads. In: WWW

  2. Chen W, Wang C, Wang Y (2010) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD

  3. Goyal A, Bonchi F, Lakshmanan LVS (2011) A data-based approach to social influence maximization. PVLDB 5(1):73–84

    Google Scholar 

  4. Yuan Y, Wang G, Wang H, Chen L (2011) Efficient subgraph search over large uncertain graphs. PVLDB 4(11):876–886

    Google Scholar 

  5. Yuan Y, Wang G, Chen L, Wang H (2012) Efficient subgraph similarity search on large probabilistic graph databases. PVLDB 5(9):800–811

    Google Scholar 

  6. Yuan Y, Wang G, Jeffery YX, Chen L (2015) Efficient distributed subgraph similarity matching. VLDB J 24(3):369–394

    Article  Google Scholar 

  7. Yuan Y, Lian X, Chen L, Sun Y, Wang G (2016) RSKNN: kNN search on road networks by incorporating social influence. TKDE 28(6):1575–1588

    Google Scholar 

  8. Chen L, Shang S, Yang C, Li J (2020) Spatial keyword search: a survey. GeoInformatica 24(1):85–106

    Article  Google Scholar 

  9. Chen L, Shang S (2019) Approximate spatio-temporal top-k publish/subscribe. World Wide Web Journal 22(5):2153–2175

    Article  Google Scholar 

  10. Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420

    Article  Google Scholar 

  11. Shang S, Chen L, Zheng K, Jensen CS, Wei Z, Kalnis P (2019) Parallel trajectory-to-location join. TKDE 31(6):1194–1207

    Google Scholar 

  12. Chen Y-C, Peng W-C, Lee S-Y (2012) Efficient algorithms for influence maximization in social networks. KAIS 33(3):577–601

    Google Scholar 

  13. Domingos P, Richardson M (2001) Mining the network value of customers. In: KDD

  14. Goyal A, Lu W, Lakshmanan LVS (2011) CELF++: optimizing the greedy algorithm for influence maximization in social networks. In: WWW (Companion Volume),

  15. Kempe D, Kleinberg JM, Tardos E (2003) Maximizing the spread of influence through a social network. In: KDD

  16. Kimura M, Saito K (2006) Tractable models for information diffusion in social networks. In: PKDD

  17. Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen JM, Glance NS (2007) Cost-effective outbreak detection in networks. In: KDD

  18. Long C, Wong RC-W (2011) Minimizing seed set for viral marketing. In: ICDM

  19. Richardson M, Domingos R (2002) Mining knowledge-sharing sites for viral marketing. In: KDD

  20. Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. JCSS 55(3):441–453

    MathSciNet  MATH  Google Scholar 

  21. Cheng S, Shen H, Huang J, Zhang G, Cheng X (2013) Staticgreedy: solving the scalability-accuracy dilemma in influence maximization. In: CIKM

  22. Kim J, Kim S-K, Yu H (2013) Scalable and parallelizable processing of influence maximization for large-scale social networks. In: ICDE

  23. Liu X, Li M, Li S, Peng S, Liao X, Lu X (2014) IMGPU: Gpu-accelerated influence maximization in large-scale social networks. TPDS 25(1):136–145

    Google Scholar 

  24. Zhuang H, Sun Y, Tang J, Zhang J, Sun X (2013) Influence maximization in dynamic social networks. In: ICDM

  25. Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: KDD

  26. Tarjan RE (1972) Depth-first search and linear graph algorithms. SICOMP 1(2):146–160

    Article  MathSciNet  Google Scholar 

  27. Wang C, Tang J, Sun J, Han J (2011) Dynamic social influence analysis through time-dependent factor graphs. In: ASONAM

  28. Eppstein D (1994) Finding the k shortest paths. In: FOCS

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hao Wang or Leong Hou U.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: : Proof of Lemma 3

Appendix: : Proof of Lemma 3

Consider the set Gs of all possible random subgraphs of G. The size of Gs is n = 2|E| since each edge is considered independently of the others. By definition, the influence of \(U\subseteq V\) is

$$ \begin{array}{@{}rcl@{}} I(U|G)=\sum\limits_{i=1}^{n}p_{i}\cdot r\left( U\left|G_{i}\right.\right), \end{array} $$

where Gi = (V,Ei) ∈Gs and pi is its occurring probability,

$$ \begin{array}{@{}rcl@{}} p_{i}=\underset{e_{u,v}\in E_{i}}{\prod}p_{u,v}\cdot\underset{e_{u,v}\not\in E_{i}}{\prod}\left( 1-p_{u,v}\right). \end{array} $$

Similarly, there is a set \(\mathbf {G}_{s}^{+}\) containing all the \(2^{\left |E^{+}\right |}\) possible random graphs of G+. Each GiGs is also a possible random subgraph of G+, though its occurring probability may change since pu,v’s may change.

\(\mathbf {G}_{s}^{+}\) can be constructed by augmenting Gi’s. Consider the set of newly added edges in G+. (There are in total \(c=\left |E^{+}\setminus E\right |\) such edges.) In each Gi, we enumerate all possible m = 2c occurrence patterns of these newly added edges. As a consequence, each GiGs corresponds to a clutter of m subgraphs in \(\mathbf {G}_{s}^{+}\), denoted as \(\left \{G^{+}_{i1},G^{+}_{i2},\cdots ,G^{+}_{im}\right \}\). The probability distribution within the clutter is the same for every Gi; let qj be the occurring probability of \(G^{+}_{ij}\) given Gi. Now we get

$$ \begin{array}{@{}rcl@{}} I(U|G^{+})=\sum\limits_{i=1}^{n}p^{\prime}_{i}\left( \sum\limits_{j=1}^{m} q_{j}\cdot r\left( U\left|G^{+}_{ij}\right.\right)\right), \end{array} $$

and thus

$$ \begin{array}{@{}rcl@{}} {\Delta} I(U)=\sum\limits_{i=1}^{n}p^{\prime}_{i}\left( \sum\limits_{j=1}^{m} q_{j}\cdot r\left( U\left|G^{+}_{ij}\right.\right)\right)-\sum\limits_{i=1}^{n}p_{i}\cdot r\left( U\left|G_{i}\right.\right). \end{array} $$

Based on the above equation, we can revise ΔI(ST) ≤ I(S) + I(T) into the following inequality.

$$ \begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{n} p^{\prime}_{i}\left( \sum\limits_{j=1}^{m} q_{j}\cdot\beta(S,T,G^{+}_{ij})\right)\geq \sum\limits_{i=1}^{n} p_{i}\cdot\beta(S,T,G_{i}), \end{array} $$
(2)

where \(\beta (S,T,\cdot )=r\left (S\left |\cdot \right .\right )+r\left (T\left |\cdot \right .\right )-r\left (S\cup T\left |\cdot \right .\right )\) gives the number of vertices that are commonly reachable from S and T in a given graph. One intuitive and yet essential property of β is that if \(G\subseteq G^{\prime }\) then \(\beta (S,T,G)\leq \beta (S,T,G^{\prime })\).

In Eq. 2 we notice that \(p^{\prime }_{i}\) could be smaller than pi. However, whenever this happens, there must exist some such that (i) \(p_{\ell }>p^{\prime }_{i}\) and (ii) \(G_{i}\subseteq G_{\ell }\) thus β(S,T,Gi) ≤ β(S,T,G). In general, graph update ΔE+ tends to shift the probability distribution towards subgraphs that have more edges. Therefore, we have

$$ \begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{n} p^{\prime}_{i}\left( \sum\limits_{j=1}^{m} q_{j}\cdot\beta(S,T,G^{+}_{ij})\right)&\geq& \sum\limits_{i=1}^{n} p_{i}\left( \sum\limits_{j=1}^{m} q_{j}\cdot\beta(S,T,G^{+}_{ij})\right)\\ &\geq&\sum\limits_{i=1}^{n} p_{i}\cdot\beta(S,T,G_{i}). \end{array} $$

The last inequality holds because all \(G^{+}_{ij}\)’s are supergraphs of Gi, therefore \(\beta (S,T,G^{+}_{ij})\geq \beta (S,T,G_{i})\) and so is their non-negative linear combination.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Wang, H. & U, L.H. Dynamic top-k influence maximization in social networks. Geoinformatica 26, 323–346 (2022). https://doi.org/10.1007/s10707-020-00419-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-020-00419-6

Keywords

Navigation