Elsevier

Information Sciences

Volume 546, 6 February 2021, Pages 1273-1305
Information Sciences

Attribute based diversification of seeds for targeted influence maximization

https://doi.org/10.1016/j.ins.2020.08.093Get rights and content

Abstract

Embedding diversity into knowledge discovery is important: the patterns mined will be more novel, more meaningful, and broader. Surprisingly, in the classic problem of influence maximization in social networks, relatively little study has been devoted to diversity and its integration into the objective function of an influence maximization method.

In this work, we propose the integration of a categorical-based notion of seed diversity into the objective function of a targeted influence maximization problem. In this respect, we assume that the users of a social network are associated with a categorical dataset where each tuple expresses the profile of a user according to a predefined schema of categorical attributes. Upon this assumption, we design a class of monotone submodular functions specifically conceived for determining the diversity of the subset of categorical tuples associated with the seed users to be discovered. This allows us to develop an efficient approximate method, with a constant-factor guarantee of optimality. More precisely, we formulate the attribute-based diversity-sensitive targeted influence maximization problem under the state-of-the-art reverse influence sampling framework, and we develop a method, dubbed ADITUM, that ensures a (1-1/e-)-approximate solution under the general triggering diffusion model. Extensive experimental evaluation based on real-world networks as well as synthetically generated data has shown the meaningfulness and uniqueness of our proposed class of set diversity functions and of the ADITUM algorithm, also in comparison with methods that exploit numerical-attribute-based diversity and topology-driven diversity in influence maximization.

Introduction

Online social networks (OSNs) are a suitable environment for propagating influence between connected individuals, so that they have become the most profitable channel for a variety of purposes related to viral marketing, advertisement campaigns, news propagation, and many others. In this regard, a classic optimization problem is influence maximization (IM), which is to discover a set of seeds, i.e., initial influencers or early-adopters, that can maximize the spread of information through the network (e.g., advertising of a product) [21]. The basic principle is that, by finding the most effective users to endorse an idea/product/information and to influence other users in the network, a chain reaction of influence can be activated and driven by a “word-of-mouth” effect, in such a way that with a very small marketing cost (i.e., the number of initial influencers) a very large portion of the network will be reached. The extent of this portion can conveniently be limited to a selection of users depending on predetermined constraints, such as based on strategic location or interest in contents that are being diffused; in fact, in many practical scenarios, companies want to tailor their advertisement strategies in order to address only selected OSN-users as potential customers. This is the perspective adopted in the context of targeted IM, which is also a focus of this work.

Maximizing the spread of information is directly related to an a priori specified budget as the number of seeds. In a more complex “budgeted” scenario of profit maximization, each of these seeds could be associated with a different cost to engage it as early-adopter, which would imply to account for these costs as constraints in a (targeted) IM problem. Moreover, we have the opportunity to make the seed-selection step in IM more sensitive to user features. In particular, we believe that the “influence potential” of the seeds being selected can be well-explained in terms of diversity that may characterize the seeds.

Intuitively, influencers that are diverse to each other according to certain features (e.g., age, gender, socio-cultural aspects, preferences) might have more opinions, experiences, and perspectives to bear on the influence propagation process. As a consequence, identifying a set of seed users that have as more different characteristics as possible from each other, will be helpful to enhance the marketing or information-propagation campaign strategies to engage the target users. Indeed, before taking any decision for active involvement in a given propagation scenario, every user in the network would like to acquire enough information, possibly from different perspectives. Therefore, by identifying the most diverse seed users, the triggering stimuli will also be diversified, and since diverse individuals tend to connect to many different types of members, the likelihood of influencing the targets would be higher.

Accounting for diversity in influence propagation has important implications, also from an ethical viewpoint. In fact, favoring diversity in selecting the early-adopters as well as in targeting the users to reach is strictly related to being exposed to diverse opinions: as previously argued in [1], the latter can significantly contribute to disrupt information bubbles or echo chambers — where pre-existing opinions are maintained and reinforced — thus raising the level of democratic debate.

Despite the importance of leveraging diversity for improved solutions to IM problems, it comes to our surprise that relatively few studies have considered diversity in such a context. Some work has focused on understanding relations between diversity, or fairness, and effectiveness/efficiency in the spreading ability [3], [19], [14], [45]. Node diversity into the IM task was first introduced by Tang et al. [46], where numerical attributes reflecting user preferences on some predefined categories (e.g., movie genres) are considered to address a generic IM task. In [6], we originally defined an IM problem that is both targeted and diversity-sensitive for the seed selection, however, it only considers specific notions of diversity that are driven by the topology of the information diffusion graph. Also, [1] studied diversity of exposure, which relies on an item-aware propagation setting.

Contributions. In this work, we aim to advance research on IM by formulating a novel targeted IM problem that accounts for categorical attribute-based diversity of the seeds to be identified. Our contributions are summarized as follows.

  • We propose  the Attribute-based DIversity-sensitive  Targeted InflUence Maximization problem, dubbed ADITUM.1 A key aspect is that the set of nodes in the network is associated with a categorical dataset, which would represent the node profiles according to a schema of categorical attributes and corresponding values.

  • We provide conceptually different notions of diversity that are able to reflect the variety in the categorical attributes and their values that characterize the seeds being discovered. Remarkably, we design a class of nondecreasing monotone and submodular functions for categorical diversity, each of which also has the nice property of enabling incremental computation of a node’s marginal gain when added to the current seed set. To the best of our knowledge, we are the first to propose a formal systematization of approaches and functions for determining submodular set diversity in influence propagation and related problems in information networks.

  • We design our solution to the ADITUM problem under the Reverse Influence Sampling (RIS) paradigm [4], [49], which is widely recognized as the state-of-the-art approach for IM problems. One challenge that we address is revisiting the RIS framework to deal with both the targeted nature and the diversity-awareness of the ADITUM problem.

  • We develop the ADITUM algorithm, which returns a size-k seed set ensuring an approximation ratio of (1-1/e-), with high probability (at least 1-|V|-1), in O(-2k(|E|+|V|)log|V|) time (with sampling error) on a diffusion graph with |V| nodes and |E| edges, under the Triggering model, which is a general diffusion model adopted by most existing work in monotone submodular IM.

  • We thoroughly analyzed our proposed diversity functions on synthetically generated datasets, and we experimentally evaluated ADITUM on publicly available network datasets, three of which were used in a user engagement context, one in community interaction, and the other one in recommendation. We make this choice so we can compare ADITUM against the methods in [46], [6].

Plan of the paper. The remainder of this paper is organized as follows. Section 2 discusses related work, with emphasis on targeted IM and diversity-aware IM. Section 3 formalizes the information diffusion context model, the objective function, and the optimization problem under consideration. Section 4 presents our study on monotone and submodular diversity functions for categorical data modeling the profiles of nodes in a network. Section 5 describes our proposed approach and algorithm for the ADITUM problem. Sections 6 Evaluation methodology, 7 Experimental results contain our experimental evaluation methodology and results, respectively. In Section 8, we provide our conclusions and pointers for future research.

Section snippets

Related work

Given a weighted directed graph, an information diffusion model, and a positive integer k, the problem of IM is to find a seed set S of size k that maximizes the expected number of active nodes at the end of the diffusion process started from S. The foundations of IM as an optimization problem were initially posed by Kempe et al. in their seminal work [21], and rely on two main findings. The first one is the intractability of the problem in its two sources of complexity, i.e., to discover a k

Problem statement

Representation model Given a social network graph G0=V,E, with set of nodes V and set of edges E, let G=G0(b,t)=V,E,b,t be a directed weighted graph representing the information diffusion context associated with G0, with b:E(0,1] edge weighing function, and t:V(0,1] node weighing function.

Function t determines the status of each node as target, i.e., a node toward which the information diffusion process is directed. Given a user-specified threshold τTS[0,1], we define the target set TS

Monotone and submodular diversity functions for a set of categorical tuples

We assume that the nodes in the social network graph G0=V,E are associated with side-information in the form of symbolic values that are valid for a predetermined set of categorical attributes, or schema, A={A1,,Am}. For each AA, we denote with domA its domain, i.e., the set of admissible values known for A, and with dom the union of attribute domains. Moreover, we define valA:VdomA as a function that associates a node with a value of A. For any SV, we will also use symbols domA(S) and dom

A RIS-based framework for the ADITUM problem

We develop our framework for the ADITUM problem based on the Reverse Influence Sampling (RIS) paradigm first introduced in [4] and recognized as the state-of-the-art approach for IM problems.

As discussed in Section 2, the RIS based approach overcomes the limitations of the Monte Carlo based greedy approach to IM. The RIS paradigm relies on the following two concepts. Given the diffusion graph G with node set V and edge set E, let G be an instance of G obtained by removing each edge eE with

Data

We used both synthetic and real-world data for our experimental evaluation. We selected real-world online social networks (OSNs) as input graphs for the influence maximization task, while for the specification of the categorical data, we adopted a twofold methodology: firstly, we developed a generator of synthetic categorical datasets as benchmark for an in-depth analysis of the different diversity functions; secondly, we exploited user profile data, when available, associated to the users in

Stage 1 – sensitivity of diversity functions

To characterize the behavior of our diversity functions, we analyzed their sensitivity to the input categorical data, by varying the number of attributes, the number of attribute symbols, and their distribution. Our assessment is focused around the following two statistics: (i) the relative change rate and (ii) the average Jensen-Shannon divergence.

The relative change rate of a diversity function is computed w.r.t. the change in the size of the set of categorical tuples upon which the diversity

Conclusions

We proposed a novel targeted influence maximization problem which accounts for the diversification of the seeds according to side-information available at node level in the general form of categorical attribute values. We defined a class of nondecreasing monotone and submodular functions to determine diversity of the categorical profiles associated to seed nodes. Our developed RIS-based ADITUM algorithm was compared to two IM methods, the one exploiting topology-driven diversity and the other

Credit authorship contribution statement

Antonio Caliò: Methodology, Software, Formal analysis, Investigation, Data curation. Andrea Tagarelli: Conceptualization, Methodology, Validation, Software, Formal analysis, Investigation, Data curation, Visualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

  • Ç. Aslay et al.

    Maximizing the diversity of exposure in a social network

    Proc. IEEE Int. Conf. on Data Mining (ICDM)

    (2018)
  • Q. Bao, W. K. Cheung, Y. Zhang, Incorporating structural diversity of neighbors in a diffusion model for social...
  • C. Borgs, M. Brautbar, J. Chayes, B. Lucier, Maximizing social influence in nearly optimal time, in: Proc. ACM-SIAM...
  • A. Caliò et al.

    Topology-driven diversity for targeted influence maximization with application to user engagement in social networks

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • A. Caliò et al.

    Cores matter? An analysis of graph decomposition effects on influence maximization problems

  • S. Chen et al.

    Online topic-aware influence maximization

    PVLDB

    (2015)
  • W. Chen et al.

    Scalable influence maximization for prevalent viral marketing in large-scale social networks

  • W. Chen et al.

    Scalable influence maximization in social networks under the linear threshold model

    Proc. IEEE Int. Conf. on Data Mining (ICDM)

    (2010)
  • Y. Chen, W. Zhu, W. Peng, W. Lee, S. Lee, CIM: community-based influence maximization in social networks, ACM TIST,...
  • T.M. Cover et al.

    Elements of Information Theory

    (2006)
  • X. Deng et al.

    Credit distribution for influence maximization in online social networks with node features

    J. Intell. Fuzzy Syst.

    (2016)
  • Y.-H. Fu, C.-Y. Huang, C.-T. Sun, Using global diversity and local topology features to identify influential network...
  • A. Goyal et al.

    Simpath: an efficient algorithm for influence maximization under the linear threshold model

  • H. Huang et al.

    Community-based influence maximization for viral marketing

    Appl. Intell.

    (2019)
  • P. Huang, H. Liu, C. Chen, P. Cheng, The impact of social diversity and dynamic influence propagation for identifying...
  • Cited by (30)

    • Identifying influential users in unknown social networks for adaptive incentive allocation under budget restriction

      2023, Information Sciences
      Citation Excerpt :

      The influence maximization problem, a similar problem aiming to select a number of users from a social network with the maximum influence spread, has been investigated over the past ten years [10]. Although many studies have been conducted for solving the influence maximization problem from an algorithmic perspective [33–39], these works assume that the topology of a social network is explicitly given as input. However, knowledge about the network is usually unknown beforehand in many real-world applications and must be gathered via learning, observation, and survey [6,13].

    • Targeted influence maximization in complex networks

      2023, Physica D: Nonlinear Phenomena
    • TSIFIM: A three-stage iterative framework for influence maximization in complex networks

      2023, Expert Systems with Applications
      Citation Excerpt :

      Based on network topology characteristic, Tang et al. (2020) proposed the discrete shuffled frog-leaping algorithm for solving the IM problem. Calio and Tagarelli (2021) put forward the ADITUM algorithm to determine the influential spreaders in complex networks, which disperses the seeds as much as possible according to the side-information available at node level, where the side-information corresponds to the categorical attribute values. Lotf et al. (2021) presented a dynamic generalized genetic algorithm to select a dynamic seed set in social networks, which uses soft computing to propose a method that takes into account time limitations, scalability, and network structures, and improves the speed and accuracy of recognizing influential individuals through reduction of computations.

    View all citing articles on Scopus
    View full text