当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A fractional memory-efficient approach for online continuous-time influence maximization
The VLDB Journal ( IF 2.8 ) Pub Date : 2021-06-12 , DOI: 10.1007/s00778-021-00679-0
Glenn S. Bevilacqua , Laks V. S. Lakshmanan

Influence maximization (IM) under a continuous-time diffusion model requires finding a set of initial adopters which when activated lead to the maximum expected number of users becoming activated within a given amount of time. State-of-the-art approximation algorithms applicable to solving this intractable problem use reverse reachability influence samples to approximate the diffusion process. Unfortunately, these algorithms require storing large collections of such samples which can become prohibitive depending on the desired solution quality, properties of the diffusion process and seed set size. To remedy this, we design an algorithm that allows the influence samples to be processed in a streaming manner, avoiding the need to store them. We approach IM using two fractional objectives: a fractional relaxation and a multi-linear extension of the original objective function. We derive a progressively improved upper bound to the optimal solution, which we empirically find to be tighter than the best existing upper bound. This enables instance-dependent solution quality guarantees that are observed to be vastly superior to the theoretical worst case. Leveraging these, we develop an algorithm that delivers solutions with a superior empirical solution quality guarantee at comparable running time with greatly reduced memory usage compared to the state-of-the-art. We demonstrate the superiority of our approach via extensive experiments on five real datasets of varying sizes of up to 41M nodes and 1.5B edges.



中文翻译:

一种在线连续时间影响最大​​化的分数内存高效方法

连续时间扩散模型下的影响最大化 (IM) 需要找到一组初始采用者,当激活时,会导致在给定时间内激活的最大预期用户数。适用于解决这个棘手问题的最先进的近似算法使用反向可达性影响样本来近似扩散过程。不幸的是,这些算法需要存储大量此类样本,这可能会变得令人望而却步,具体取决于所需的解决方案质量、扩散过程的特性和种子集大小。为了解决这个问题,我们设计了一种算法,允许以方式处理影响样本,避免存储它们。我们使用两个分数目标来处理 IM:分数松弛和原始目标函数的多线性扩展。我们推导出一个逐步改进的最优解的上限,我们凭经验发现它比现有的最佳上限更紧。这使得依赖于实例的解决方案质量保证被观察到远远优于理论上的最坏情况。利用这些,我们开发了一种算法,该算法在可比的运行时间内提供具有卓越经验解决方案质量保证的解决方案,与最先进的技术相比,大大减少了内存使用量。我们通过对多达 41M 节点和 1.5B 边的不同大小的五个真实数据集的广泛实验证明了我们方法的优越性。

更新日期:2021-06-13
down
wechat
bug