Abstract
Influence maximization (IM) under a continuous-time diffusion model requires finding a set of initial adopters which when activated lead to the maximum expected number of users becoming activated within a given amount of time. State-of-the-art approximation algorithms applicable to solving this intractable problem use reverse reachability influence samples to approximate the diffusion process. Unfortunately, these algorithms require storing large collections of such samples which can become prohibitive depending on the desired solution quality, properties of the diffusion process and seed set size. To remedy this, we design an algorithm that allows the influence samples to be processed in a streaming manner, avoiding the need to store them. We approach IM using two fractional objectives: a fractional relaxation and a multi-linear extension of the original objective function. We derive a progressively improved upper bound to the optimal solution, which we empirically find to be tighter than the best existing upper bound. This enables instance-dependent solution quality guarantees that are observed to be vastly superior to the theoretical worst case. Leveraging these, we develop an algorithm that delivers solutions with a superior empirical solution quality guarantee at comparable running time with greatly reduced memory usage compared to the state-of-the-art. We demonstrate the superiority of our approach via extensive experiments on five real datasets of varying sizes of up to 41M nodes and 1.5B edges.
Similar content being viewed by others
Notes
Recall that \(\mathbf {x}^*\) is the optimal fractional solution to F (see Eq. 2).
Notice that \(1/\lambda = [\sum _{i=1}^n (1/\lambda _i)^\ell ]^{1/\ell }\).
We utilize a version of IMM that corrects the issue raised by [10].
All implementations were compiled using Intel compiler ICC 18.0.1 using optimization level -O3.
OpenMP is used for parallel execution.
Samples that OPIM uses in its validation phase are not counted.
References
Ageev, A., Sviridenko, M.: Pipage rounding: a new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)
Arora, A., Galhotra, S., Ranu, S.: Debunking the myths of influence maximization: an in-depth benchmarking study. In: SIGMOD, pp. 651–666 (2017)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680 (2014)
Bateni, M., Esfandiari, H., Mirrokni, V.: Almost optimal streaming algorithms for coverage problems. In: SPAA, pp. 13–23 (2017)
Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: SODA, pp. 946–957 (2014)
Bury, K.V.: Statistical Models in Applied Science. Wiley, London (1975)
Calinescu, G., Chekuri, C., Pál, M., Vondrák, J.: Maximizing a submodular set function subject to a matroid constraint (extended abstract). In: IPCO, pp. 182–196 (2007)
Chakrabarti, A., Wirth, A.: Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover, pp. 1365–1373 (2016)
Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization. In: AISTATS, vol. 84, pp. 1896–1905 (2018)
Chen, W.: An issue in the martingale analysis of the influence maximization algorithm imm. In: Computational Data and Social Networks, pp. 286–297 (2018)
Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD, pp. 1029–1038 (2010)
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD, pp. 199–208 (2009)
Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: ICDM, pp. 88–97 (2010)
Cheng, S., Shen, H., Huang, J., Chen, W., Cheng, X.: Imrank: influence maximization via finding self-consistent ranking. In: SIGIR, p. 475–484 (2014)
Cohen, E., Delling, D., Pajor, T., Werneck, R.F.: Sketch-based influence maximization and computation: scaling up with guarantees. In: CIKM, pp. 629–638 (2014)
Dagum, P., Karp, R., Luby, M., Ross, S.: An optimal algorithm for monte Carlo estimation. SIAM J. Comput. 29(5), 1484–1496 (2000)
Demaine, E.D., Indyk, P., Mahabadi, S., Vakilian, A.: On streaming and communication complexity of the set cover problem. In: Discrete Computation, pp. 484–498 (2014)
Domingos, P., Richardson, M.: Mining the network value of customers. In: KDD, pp. 57–66 (2001)
Du, N., Song, L., Gomez Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NeurIPS, pp. 3147–3155. Curran Associates, Inc. (2013)
Du, N., Song, L., Yuan, M., Smola, A.J.: Learning networks of heterogeneous influence. In: NeurIPS, pp. 2780–2788. Curran Associates, Inc. (2012)
Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674–701 (2012)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Quart. 3, 95–110 (1956)
Galhotra, S., Arora, A., Roy, S.: Holistic influence maximization: combining scalability and efficiency with opinion-aware models. In: SIGMOD, pp. 743–758 (2016)
Gibbs, D.L., Shmulevich, I.: Solving the influence maximization problem reveals regulatory organization of the yeast cell cycle. PLOS Comput. Biol. (2017)
Goemans, M.X., Williamson, D.P.: New 3/4-approximation algorithms for max sat. SIAM J. Discrete Math. 7, 313–321 (1994)
Goldenberg, J., Libai, B.: Muller: using complex systems analysis to advance marketing theory development. Acad. Market. Sci. Rev. (2001)
Goldenberg, J., Libai, B., Muller, E.: Talk of the network: a complex systems look at the underlying process of word-of-mouth. Market. Lett. 12(3), 211–223 (2001)
Gomez-Rodriguez, M., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. In: ICML, pp. 561–568 (2011)
Gomez-Rodriguez, M., Leskovec, J., Schölkopf, B.: Modeling information propagation with survival theory. In: ICML, pp. III–666–III–674 (2013)
Gomez Rodriguez, M., Leskovec, J., Schölkopf, B.: Structure and dynamics of information pathways in online media. In: WSDM, pp. 23–32 (2013)
Goyal, A., Lu, W., Lakshmanan, L.V.S.: Simpath: an efficient algorithm for influence maximization under the linear threshold model. In: ICDM, pp. 211–220 (2011)
Granovetter, M.: Threshold models of collective behavior. Am. J. Soc. 83(6), 1420–1443 (1978)
Guo, Q., Wang, S., Wei, Z., Chen, M.: Influence maximization revisited: efficient reverse reachable set generation with bound tightened. In: SIGMOD, pp. 2167–2181 (2020)
Har-Peled, S., Indyk, P., Mahabadi, S., Vakilian, A.: Towards tight bounds for the streaming set cover problem. In: PODS, pp. 371–383 (2016)
Huang, K., Wang, S., Bevilacqua, G.S., Xiao, X., Lakshmanan, L.V.S.: Revisiting the stop-and-stare algorithms for influence maximization. PVLDB 10(9), 913–924 (2017)
Ienco, D., Bonchi, F., Castillo, C.: The meme ranking problem: maximizing microblogging virality. In: ICDMW, pp. 328–335 (2010)
Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. ICML 28, 427–435 (2013)
Jung, K., Heo, W., Chen, W.: IRIE: scalable and robust influence maximization in social networks. In: ICDM, pp. 918–923 (2012)
Karimi, M., Lucic, M., Hassani, H., Krause, A.: Stochastic submodular maximization: the case of coverage functions. In: NeurIPS, pp. 6853–6863. Curran Associates, Inc. (2017)
Karlin, S.: Mathematical Methods and Theory in Games, Programming, and Economics. Addison-Wesley, Reading (1959)
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146 (2003)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? http://an.kaist.ac.kr/traces/WWW2010.html (2010)
Lee, D., Hosanagar, K., Nair, H.: Advertising content and consumer engagement on social media: evidence from facebook. Manag. Sci. 64 (2018)
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Faloutsos, C., Van Briesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: KDD, pp. 420–429 (2007)
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Li, X., Smith, J.D., Dinh, T.N., Thai, M.T.: Why approximate when you can get the exact? Optimal targeted viral marketing at scale. In: INFOCOM, pp. 1–9 (2017)
Li, Y., Fan, J., Zhang, D., Tan, K.L.: Discovering your selling points: personalized social influential tags exploration. In: SIGMOD, pp. 619–634 (2017)
McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics. Springer, New York (1998)
Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: closing the gap. AISTATS 84, 1886–1895 (2018)
Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)
Nguyen, H., Nguyen, T., Phan, N.H., Dinh, T.: Importance sketching of influence dynamics in billion-scale networks. In: ICDM, pp. 337–346 (2017)
Nguyen, H.T., Thai, M.T., Dinh, T.N.: Stop-and-stare: Optimal sampling algorithms for viral marketing in billion-scale networks. In: SIGMOD, pp. 695–710 (2016)
Ohsaka, N.: The solution distribution of influence maximization: a high-level experimental study on three algorithmic approaches. In: SIGMOD, pp. 2151–2166 (2020)
Ohsaka, N., Akiba, T., Yoshida, Y., Kawarabayashi, K.I.: Fast and accurate influence maximization on large networks with pruned monte-carlo simulations. In: AAAI, pp. 138–144 (2014)
Ohsaka, N., Sonobe, T., Fujita, S., Kawarabayashi, K.I.: Coarsening massive influence networks for scalable diffusion analysis. In: SIGMOD, pp. 635–650 (2017)
Popova, D., Ohsaka, N., Kawarabayashi, K.i., Thomo, A.: Nosingles: a space-efficient algorithm for influence maximization. In: SSDBM, pp. 18:1–18:12 (2018)
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD, pp. 61–70 (2002)
Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: SDM, pp. 697–708 (2009)
Shapiro, H.N.: Note on a computation method in the theory of games. Commun. Pure Appl. Math. 11(4), 587–593 (1958)
Shewan, D.: The comprehensive guide to online advertising costs. https://www.wordstream.com/blog/ws/2017/07/05/online-advertising-costs (2020)
Song, X., Chi, Y., Hino, K., Tseng, B.L.: Information flow modeling based on diffusion rate for prediction and ranking. In: WWW, pp. 191–200 (2007)
Tang, J., Tang, X., Xiao, X., Yuan, J.: Online processing algorithms for influence maximization. In: SIGMOD, pp. 991–1005 (2018)
Tang, Y., Shi, Y., Xiao, X.: Influence maximization in near-linear time: a martingale approach. In: SIGMOD, pp. 1539–1554 (2015)
Tang, Y., Xiao, X., Shi, Y.: Influence maximization: near-optimal time complexity meets practical efficiency. In: SIGMOD, pp. 75–86 (2014)
Wang, C., Chen, W., Wang, Y.: Scalable influence maximization for independent cascade model in large-scale social networks. Data Min. Knowl. Disc. 25(3), 545–576 (2012)
Zhang, K., Bhattacharyya, S., Ram, S.: Large-scale network analysis for online social brand advertising. MIS Q. 40, 849–868 (2016)
Zubcsek, P.P., Sarvary, M.: Advertising to a social network. Quant. Market. Econ. 9, 71–107 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A : Omitted proofs
Appendix A : Omitted proofs
Proof of Lemma 4
Starting from \(\Pr [X \ge (1 + \epsilon ) \chi ] < \exp \left( - \frac{\epsilon ^2 \chi }{2 (1 + \epsilon /3)} \right) \), where \(\chi {:}{=}\mu t\), a bound violation probability of at most \(\delta \) is needed (i.e., \(\delta = \exp ( - \frac{\epsilon ^2 \chi }{2 (1 + \epsilon /3)} )\)). Rearranging into quadratic form in \(\epsilon \),
By solving the quadratic it can be determined that this is ensured if,
Hence,
From the inner constraint we have,
Squaring both sides and rearranging.
Solving for \(\chi \) via the quadratic formula and simplifying.
As such we have,
Let \(\text{ LB }(X,\delta ) {:}{=}X + \frac{2}{3} \log (\frac{1}{\delta }) - \sqrt{\frac{2}{9} \log (\frac{1}{\delta }) (9 X + 2 \log (\frac{1}{\delta })}\) then, \(\Pr [\mu > \text{ LB }(X,\delta )/t] \ge 1 - \delta \) \(\square \)
Proof of Lemma 5
Starting from \(\Pr [X \le (1 - \epsilon ) \chi ] \le \exp \left( - \frac{1}{2} \epsilon ^2 \chi \right) \), where \(\chi {:}{=}\mu t\), a concentration bound violation probability of at most \(\delta \) is needed (i.e., \(\delta = \exp ( - \frac{1}{2} \epsilon ^2 \chi )\)). This is ensured if \(\epsilon = \sqrt{-2 \log (\delta ) / \chi }\). Hence,
From the inner constraint we have, \(0 \le \chi - \sqrt{-2 \log (\delta ) \chi } - X\) Solving for \(\sqrt{\chi }\) via the quadratic formula gives, \(\sqrt{\chi } \ge \left( \sqrt{-2 \log (\delta )} + \sqrt{4X - 2 \log (\delta )}\right) /2\). As such we have,
Let \(\text{ UB }(X,\delta ) {:}{=}X + \log (\frac{1}{\delta }) + \sqrt{\log (\frac{1}{\delta })(2 X + \log (\frac{1}{\delta }))}\) then, \(\Pr [\mu < \text{ UB }(X,\delta )/t] \ge 1 - \delta \) \(\square \)
Proof of Lemma 8
Without lost of generality we will consider the left child of a group. From the applicability condition succeeding we have, \(\textstyle T \cdot [\sum _{i \in L \cup R} (1/\lambda _i)^\ell ]^{1/\ell } < 1\). Using this and the fact that the scale parameters are non-negative we have that, \(\textstyle T \cdot [\sum _{i \in L} (1/\lambda _i)^\ell ]^{1/\ell } < 1\). What remains to be shown is that increasing \(\ell \) can only decrease the left-hand side (the minimum shape of a child group may only be equal or large than the minimum shape of the parent group).
Consider \(\ell '\) such that \(\ell \le \ell '\) and let \(\sum _{i \in L} (1/\lambda _i)^\ell ]^{1/\ell } < c\) for \(c > 0\) then, \(\sum _{i \in L} (1/\lambda _i)^\ell < c^\ell \). For all i, \((1/\lambda _i)^\ell < c ^ \ell \) must hold, since all terms are positive, and hence \((1/\lambda _i) < c\) must also hold. Now multiplying both sides by \(c^{\ell ' - \ell }\) gives, \(\sum _{i \in L} (1/\lambda _i)^\ell c^{\ell ' - \ell } < c^{\ell '}\). Since \((1/\lambda _i) < c\) for all i it follows that \((1/\lambda _i) ^{\ell ' - \ell } < c ^ {\ell ' - \ell }\). From this it we have, \(\sum _{i \in L} (1/\lambda _i)^{\ell '} < c^{\ell '}\). Which gives, \([\sum _{i \in L} (1/\lambda _i)^{\ell '}]^{1/\ell '} < c\). \(\square \)
Rights and permissions
About this article
Cite this article
Bevilacqua, G.S., Lakshmanan, L.V.S. A fractional memory-efficient approach for online continuous-time influence maximization. The VLDB Journal 31, 403–429 (2022). https://doi.org/10.1007/s00778-021-00679-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-021-00679-0