Elsevier

Big Data Research

Volume 25, 15 July 2021, 100208
Big Data Research

Identification of Top-K Influencers Based on Upper Confidence Bound and Local Structure

https://doi.org/10.1016/j.bdr.2021.100208Get rights and content

Abstract

We study the problem of identifying top-K influencers when we have only local knowledge of the network structure. More specifically, the selection of top-K influencers is performed sequentially over a number of rounds. We propose an efficient algorithm called strength network similarity-based upper confidence bound (SNS_UCB1) for the identification of top-K influencers based on upper confidence bound (UCB1) from the multi-armed bandit's framework. Considering feedback in online decision-making, we rely on edge (arm) strength on falling within a large number of other edges and how edge members are similar to each other and can thus convince other users to adopt the promoted behaviours. Thus, this feedback is considered as a reward score at each pull of the arm of how likely this selection is to contribute to the increase in the cumulative reward. We evaluate the proposed algorithm under the independent cascade (IC) model on four large-scale datasets that differ in size and density. We compare our algorithm to a centrality measure-based UCB1 and several well-known state-of-the-art approaches, demonstrating its superior performance in terms of influence spread achieved with the less required time and storage space.

Introduction

Online social networks or social media are widely used by billions around the globe and present a ubiquitous lifestyle routine. The wide spread of various social media platforms that let people get in touch, share data and interact easily online for various purposes is unprecedented. Influence maximization is one of interesting research topics that has attracted the attention of the research community from various fields and disciplines. It has been found to have potential applications in viral marketing that aims to influence a small number of nodes called the main seed set that will result in a large cascading process. Specifically, the influence maximization problem attempts to intelligibly identify the top-K influencers that will affect a large number of users within the network after having been selected to promote ideas, content, products, and innovations. Numerous studies have addressed this problem, and Kempe et al. [1] were the first to address this problem as a combinatorial problem and prove that the problem of identification of top-K influencers was NP-hard. The researchers proposed a greedy hill climbing algorithm that was observed to outperform others in terms of achieving a higher influence spread, while suffering from noticeable scalability issues; as argued by Chen et al. [2], running the greedy algorithm with a few hundred thousand nodes involved a high time complexity for completing the identification of the seed set on a modern server. While various efforts have been made to optimize the running time [3][4][5], this factor remains the main drawback of approaches based on a greedy algorithm applied to a moderately large graph. Other studies have tried to propose methods based on centrality measures that provide performance guarantees regarding the spread of influence with a low running time [2][6][7][8][9]. Other approaches have tried to learn the propagation probability of a diffusion model as the main ingredient for increasing the number of influenced users but did not take into account any diffusion models; those studies include methods based on algorithms called “multi-armed bandits”. Similarly, the use of structural network properties has led to some performance improvements in the influence coverage and low runtime complexity in various research studies. Hence, the selection of top-K influencer-based network measures seems promising when exploiting centrality measures and trying to boost the spread of influence in online decision-making in which the marketer may decide at each round to choose (or not) a candidate user as a seed set, aiming to increase the cumulative reward and equivalently minimize the regret. The multi-armed bandits' algorithms have been used recently in influence maximization, and their performance in learning the propagation probability and thus reducing regret has been demonstrated [10][11][12].

The aspect of identification being a key constraint and an essential ingredient of the influence maximization problem leads to designing an effective strategy that requires a low running time and provides a higher influence spread. In this work, we consider the use of the upper confidence bound (UCB1) and its impact on the cascading process under the IC model. Unlike existing studies that focused on learning the propagation probability without any knowledge of the used diffusion models, the present work deals with the identification of top-K influencers by using a UCB1 algorithm-based newly designed local measure that aggregates the strength of edges (arms) and the similarity of edge members to each other. This newly designed measure is used as a reward to efficiently select arms1 and edges to locate the edge that is attached to the highest number of other edges locally up to a certain number of hops. Thereafter, the reward is fed into the UCB1 algorithm from the MAB framework, and the arms are sequentially selected according to reward score value, which depends mainly on local centrality measures that significantly reduce the running time of the designed algorithm. After the selection of arms, the candidate seed set is selected according to their degree. Hence, our approach is independent of the used diffusion models and tries to select the most influential users based only on structural properties and on the number of times the arms are selected; the more times an arm has been selected, the less likely it is to be chosen.

The purpose of this paper is to prominently select the best top-K influencers based only on the local structure using an online learning algorithm called upper confidence bound (UCB1), which guarantees a low time and storage space complexity, while providing an acceptable influence spread. The main contributions of this work can be outlined as follows:

  • Proposing a new reward function that locally measures the arm strength and arm member similarity;

  • Introducing a new algorithm called the strength network similarity-based upper confidence bound (SNS_UCB1);

  • Evaluating the proposed algorithm with four real-world datasets under the independent cascade (IC) model;

  • Comparing the proposed algorithm to state-of-the-art approaches in terms of time complexity, storage needed and influence achieved under the IC model; and

  • Comparing the proposed SNS_UCB1 algorithm to several centrality measures embedded in the UCB1 algorithm to test how our algorithm and the proposed measure perform in terms of influence achieved, required time, storage space, cumulative reward, and regret.

The rest of this paper is organized as follows. Section 2 discusses several studies most closely related to the proposed algorithm. Section 3 provides a formulation of the system model and introduces the proposed measure. Section 4 presents the SNS_UCB1 algorithm and its analysis and discusses the time and space complexity. The results of extensive experiments performed with the proposed algorithm are depicted in Section 5, followed by conclusions and a discussion of future research directions in Section 6.

Section snippets

Related work

This section focuses on presenting the studies most closely related to the proposed approach. We focus mainly on approaches based on local structural properties under any diffusion or epidemic models. Afterwards, we present a sample of recent studies performed in the context of identifying the most influential users based on multi-armed bandit algorithms.

Salavati et al. [13] proposed a new ranking method called “gateway local rank” (GLR) based on the closeness centrality by using the local

Formulation of the system model

Consider the social network as a graph denoted by G=(V,E), where there are n=|V| users that are linked through various kinds of social relationships m=|E|. The users in the social graph can share and exchange various information for various reasons. Table 1 provide notations used throughout the manuscript with corresponding definition.

The main objective of this work is to maximize the widespread of certain information in the network. This problem of influence maximization has been extensively

Top-K influencers' identification algorithm and analysis

In this section, we present the algorithm for identification of top-K influencers and analyze its performance in terms of time and space required to determine the influence achieved under the IC model. Hence, first, we provide an overview of the upper confidence bound algorithm that was used from the multi-armed bandits' framework for the selection of the most influential users. We propose, in this work, an algorithm to maximize influence under the IC model and use the upper confidence bound, a

Experimental results

In this section, the results of simulations performed to evaluate the proposed algorithm “SNS_UCB1” against the state-of-the-art approaches are presented. We start by evaluating the “SNS_UCB1” algorithm in terms of influence achieved and time and storage space required under the IC model with propagation probability p=0.1 to select top-K influencers and compute the influence spread for seed set sizes varying from K=10 to K=50. The parameters of our algorithm are set as follows: α=0.1,β=0.09,Lci=

Conclusion

In this paper, we presented a new algorithm called “SNS_UCB1” for the identification of top-K influencers based on the upper confidence bound in a social network with the purpose of achieving a higher influence spread with lower time complexity and needed storage space, operating on a large-scale graph with different structures and densities. We focused on designing a new local centrality measure that combines the location of central edges in the network and the quality of users' selection, in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research is supported by two grants from the National Natural Science Foundation of China with Project No. 61272277 and 91746206.

Authors thank Alibaba Cloud Co., Ltd for the technical support.

References (24)

  • C. Salavati et al.

    Ranking nodes in complex networks based on local structure and improving closeness centrality

    Neurocomputing

    (2019)
  • Q. Liu et al.

    Leveraging local h-index to identify and rank influential spreaders in networks

    Phys. A, Stat. Mech. Appl.

    (2018)
  • D. Kempe et al.

    Maximizing the spread of influence through a social network

  • W. Chen et al.

    Efficient influence maximization in social networks

  • J. Leskovec et al.

    Cost-effective outbreak detection in networks

  • A. Goyal et al.

    Celf++: optimizing the greedy algorithm for influence maximization in social networks

  • C. Zhou et al.

    An upper bound based greedy algorithm for mining top-k influential nodes in social networks

  • M. Alshahrani et al.

    Efficient methods to select top-k propagators based on distance and radius neighbor

  • M. Alshahrani et al.

    Top-k influential users selection based on combined Katz centrality and propagation probability

  • M. Alshahrani et al.

    Selection of top-k influential users based on radius-neighborhood degree, multi-hops distance and selection threshold

    J. Big Data

    (2018)
  • X. Wang et al.

    Maximizing the spread of influence via generalized degree discount

    PLoS ONE

    (2016)
  • S. Vaswani et al.

    Influence maximization with bandits

  • View full text