Attribute based diversification of seeds for targeted influence maximization

doi:10.1016/j.ins.2020.08.093

Information Sciences

Volume 546, 6 February 2021, Pages 1273-1305

https://doi.org/10.1016/j.ins.2020.08.093 Get rights and content

Abstract

Embedding diversity into knowledge discovery is important: the patterns mined will be more novel, more meaningful, and broader. Surprisingly, in the classic problem of influence maximization in social networks, relatively little study has been devoted to diversity and its integration into the objective function of an influence maximization method.

In this work, we propose the integration of a categorical-based notion of seed diversity into the objective function of a targeted influence maximization problem. In this respect, we assume that the users of a social network are associated with a categorical dataset where each tuple expresses the profile of a user according to a predefined schema of categorical attributes. Upon this assumption, we design a class of monotone submodular functions specifically conceived for determining the diversity of the subset of categorical tuples associated with the seed users to be discovered. This allows us to develop an efficient approximate method, with a constant-factor guarantee of optimality. More precisely, we formulate the attribute-based diversity-sensitive targeted influence maximization problem under the state-of-the-art reverse influence sampling framework, and we develop a method, dubbed ADITUM, that ensures a $(1 - 1 / e - ∊)$ -approximate solution under the general triggering diffusion model. Extensive experimental evaluation based on real-world networks as well as synthetically generated data has shown the meaningfulness and uniqueness of our proposed class of set diversity functions and of the ADITUM algorithm, also in comparison with methods that exploit numerical-attribute-based diversity and topology-driven diversity in influence maximization.

Introduction

Online social networks (OSNs) are a suitable environment for propagating influence between connected individuals, so that they have become the most profitable channel for a variety of purposes related to viral marketing, advertisement campaigns, news propagation, and many others. In this regard, a classic optimization problem is influence maximization (IM), which is to discover a set of seeds, i.e., initial influencers or early-adopters, that can maximize the spread of information through the network (e.g., advertising of a product) [21]. The basic principle is that, by finding the most effective users to endorse an idea/product/information and to influence other users in the network, a chain reaction of influence can be activated and driven by a “word-of-mouth” effect, in such a way that with a very small marketing cost (i.e., the number of initial influencers) a very large portion of the network will be reached. The extent of this portion can conveniently be limited to a selection of users depending on predetermined constraints, such as based on strategic location or interest in contents that are being diffused; in fact, in many practical scenarios, companies want to tailor their advertisement strategies in order to address only selected OSN-users as potential customers. This is the perspective adopted in the context of targeted IM, which is also a focus of this work.

Maximizing the spread of information is directly related to an a priori specified budget as the number of seeds. In a more complex “budgeted” scenario of profit maximization, each of these seeds could be associated with a different cost to engage it as early-adopter, which would imply to account for these costs as constraints in a (targeted) IM problem. Moreover, we have the opportunity to make the seed-selection step in IM more sensitive to user features. In particular, we believe that the “influence potential” of the seeds being selected can be well-explained in terms of diversity that may characterize the seeds.

Intuitively, influencers that are diverse to each other according to certain features (e.g., age, gender, socio-cultural aspects, preferences) might have more opinions, experiences, and perspectives to bear on the influence propagation process. As a consequence, identifying a set of seed users that have as more different characteristics as possible from each other, will be helpful to enhance the marketing or information-propagation campaign strategies to engage the target users. Indeed, before taking any decision for active involvement in a given propagation scenario, every user in the network would like to acquire enough information, possibly from different perspectives. Therefore, by identifying the most diverse seed users, the triggering stimuli will also be diversified, and since diverse individuals tend to connect to many different types of members, the likelihood of influencing the targets would be higher.

Accounting for diversity in influence propagation has important implications, also from an ethical viewpoint. In fact, favoring diversity in selecting the early-adopters as well as in targeting the users to reach is strictly related to being exposed to diverse opinions: as previously argued in [1], the latter can significantly contribute to disrupt information bubbles or echo chambers — where pre-existing opinions are maintained and reinforced — thus raising the level of democratic debate.

Despite the importance of leveraging diversity for improved solutions to IM problems, it comes to our surprise that relatively few studies have considered diversity in such a context. Some work has focused on understanding relations between diversity, or fairness, and effectiveness/efficiency in the spreading ability [3], [19], [14], [45]. Node diversity into the IM task was first introduced by Tang et al. [46], where numerical attributes reflecting user preferences on some predefined categories (e.g., movie genres) are considered to address a generic IM task. In [6], we originally defined an IM problem that is both targeted and diversity-sensitive for the seed selection, however, it only considers specific notions of diversity that are driven by the topology of the information diffusion graph. Also, [1] studied diversity of exposure, which relies on an item-aware propagation setting.

Contributions. In this work, we aim to advance research on IM by formulating a novel targeted IM problem that accounts for categorical attribute-based diversity of the seeds to be identified. Our contributions are summarized as follows.

•
We propose the Attribute-based DIversity-sensitive Targeted InflUence Maximization problem, dubbed ADITUM.¹ A key aspect is that the set of nodes in the network is associated with a categorical dataset, which would represent the node profiles according to a schema of categorical attributes and corresponding values.
•
We provide conceptually different notions of diversity that are able to reflect the variety in the categorical attributes and their values that characterize the seeds being discovered. Remarkably, we design a class of nondecreasing monotone and submodular functions for categorical diversity, each of which also has the nice property of enabling incremental computation of a node’s marginal gain when added to the current seed set. To the best of our knowledge, we are the first to propose a formal systematization of approaches and functions for determining submodular set diversity in influence propagation and related problems in information networks.
•
We design our solution to the ADITUM problem under the Reverse Influence Sampling (RIS) paradigm [4], [49], which is widely recognized as the state-of-the-art approach for IM problems. One challenge that we address is revisiting the RIS framework to deal with both the targeted nature and the diversity-awareness of the ADITUM problem.
•
We develop the ADITUM algorithm, which returns a size-k seed set ensuring an approximation ratio of $(1 - 1 / e - ∊)$ , with high probability (at least $1 - | V |^{- 1}$ ), in $O (∊^{- 2} k (| E | + | V |) \log | V |)$ time (with $∊$ sampling error) on a diffusion graph with $| V |$ nodes and $| E |$ edges, under the Triggering model, which is a general diffusion model adopted by most existing work in monotone submodular IM.
•
We thoroughly analyzed our proposed diversity functions on synthetically generated datasets, and we experimentally evaluated ADITUM on publicly available network datasets, three of which were used in a user engagement context, one in community interaction, and the other one in recommendation. We make this choice so we can compare ADITUM against the methods in [46], [6].

Plan of the paper. The remainder of this paper is organized as follows. Section 2 discusses related work, with emphasis on targeted IM and diversity-aware IM. Section 3 formalizes the information diffusion context model, the objective function, and the optimization problem under consideration. Section 4 presents our study on monotone and submodular diversity functions for categorical data modeling the profiles of nodes in a network. Section 5 describes our proposed approach and algorithm for the ADITUM problem. Sections 6 Evaluation methodology, 7 Experimental results contain our experimental evaluation methodology and results, respectively. In Section 8, we provide our conclusions and pointers for future research.

Section snippets

Related work

Given a weighted directed graph, an information diffusion model, and a positive integer k, the problem of IM is to find a seed set S of size k that maximizes the expected number of active nodes at the end of the diffusion process started from S. The foundations of IM as an optimization problem were initially posed by Kempe et al. in their seminal work [21], and rely on two main findings. The first one is the intractability of the problem in its two sources of complexity, i.e., to discover a k

Problem statement

Representation model Given a social network graph $G_{0} = 〈 V, E 〉$ , with set of nodes $V$ and set of edges $E$ , let $G = G_{0} (b, t) = 〈 V, E, b, t 〉$ be a directed weighted graph representing the information diffusion context associated with $G_{0}$ , with $b : E \to (0, 1]$ edge weighing function, and $t : V \to (0, 1]$ node weighing function.

Function t determines the status of each node as target, i.e., a node toward which the information diffusion process is directed. Given a user-specified threshold $τ_{TS} \in [0, 1]$ , we define the target set $TS$

Monotone and submodular diversity functions for a set of categorical tuples

We assume that the nodes in the social network graph $G_{0} = 〈 V, E 〉$ are associated with side-information in the form of symbolic values that are valid for a predetermined set of categorical attributes, or schema, $A = {A_{1}, \dots, A_{m}}$ . For each $A \in A$ , we denote with ${dom}_{A}$ its domain, i.e., the set of admissible values known for A, and with dom the union of attribute domains. Moreover, we define ${val}_{A} : V \mapsto {dom}_{A}$ as a function that associates a node with a value of A. For any $S \subseteq V$ , we will also use symbols ${dom}_{A} (S)$ and $dom$

A RIS-based framework for the ADITUM problem

We develop our framework for the ADITUM problem based on the Reverse Influence Sampling (RIS) paradigm first introduced in [4] and recognized as the state-of-the-art approach for IM problems.

As discussed in Section 2, the RIS based approach overcomes the limitations of the Monte Carlo based greedy approach to IM. The RIS paradigm relies on the following two concepts. Given the diffusion graph $G$ with node set $V$ and edge set $E$ , let G be an instance of $G$ obtained by removing each edge $e \in E$ with

Data

We used both synthetic and real-world data for our experimental evaluation. We selected real-world online social networks (OSNs) as input graphs for the influence maximization task, while for the specification of the categorical data, we adopted a twofold methodology: firstly, we developed a generator of synthetic categorical datasets as benchmark for an in-depth analysis of the different diversity functions; secondly, we exploited user profile data, when available, associated to the users in

Stage 1 – sensitivity of diversity functions

To characterize the behavior of our diversity functions, we analyzed their sensitivity to the input categorical data, by varying the number of attributes, the number of attribute symbols, and their distribution. Our assessment is focused around the following two statistics: (i) the relative change rate and (ii) the average Jensen-Shannon divergence.

The relative change rate of a diversity function is computed w.r.t. the change in the size of the set of categorical tuples upon which the diversity

Conclusions

We proposed a novel targeted influence maximization problem which accounts for the diversification of the seeds according to side-information available at node level in the general form of categorical attribute values. We defined a class of nondecreasing monotone and submodular functions to determine diversity of the categorical profiles associated to seed nodes. Our developed RIS-based ADITUM algorithm was compared to two IM methods, the one exploiting topology-driven diversity and the other

Credit authorship contribution statement

Antonio Caliò: Methodology, Software, Formal analysis, Investigation, Data curation. Andrea Tagarelli: Conceptualization, Methodology, Validation, Software, Formal analysis, Investigation, Data curation, Visualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

S. Banerjee et al.
ComBIM: a community-based solution approach for the Budgeted Influence Maximization Problem
Expert Syst. Appl.
(2019)
A. Bozorgi et al.
INCIM: a community-based algorithm for influence maximization problem under the linear threshold model
Inf. Process. Manage.
(2016)
S. Fujishige
Polymatroid dependence structure of a set of random variables
Inf. Contr.
(1978)
F. Gursoy et al.
Influence maximization in social networks under deterministic linear threshold model
Knowl.-Based Syst.
(2018)
D. Kim et al.
Influence maximization based on reachability sketches in dynamic graphs
Inf. Sci.
(2017)
X. Li et al.
Community-based seeds selection algorithm for location aware influence maximization
Neurocomputing
(2018)
X. Li et al.
Why approximate when you can get the exact? Optimal targeted viral marketing at scale
Proc. IEEE Conf. on Computer Communications (INFOCOM)
(2017)
Y. Li et al.
Conformity-aware influence maximization with user profiles
J. Shang et al.
Cofim: a community-based framework for influence maximization on large-scale networks
Knowl.-Based Syst.
(2017)
S.S. Singh et al.
C2im: community based context-aware influence maximization in social networks
Physica A
(2019)

Ç. Aslay et al.

Maximizing the diversity of exposure in a social network

Proc. IEEE Int. Conf. on Data Mining (ICDM)

(2018)

Q. Bao, W. K. Cheung, Y. Zhang, Incorporating structural diversity of neighbors in a diffusion model for social...

C. Borgs, M. Brautbar, J. Chayes, B. Lucier, Maximizing social influence in nearly optimal time, in: Proc. ACM-SIAM...

A. Caliò et al.

Topology-driven diversity for targeted influence maximization with application to user engagement in social networks

IEEE Trans. Knowl. Data Eng.

(2018)

A. Caliò et al.

Cores matter? An analysis of graph decomposition effects on influence maximization problems

S. Chen et al.

Online topic-aware influence maximization

PVLDB

(2015)

W. Chen et al.

Scalable influence maximization for prevalent viral marketing in large-scale social networks

W. Chen et al.

Scalable influence maximization in social networks under the linear threshold model

Proc. IEEE Int. Conf. on Data Mining (ICDM)

(2010)

Y. Chen, W. Zhu, W. Peng, W. Lee, S. Lee, CIM: community-based influence maximization in social networks, ACM TIST,...

T.M. Cover et al.

Elements of Information Theory

(2006)

X. Deng et al.

Credit distribution for influence maximization in online social networks with node features

J. Intell. Fuzzy Syst.

(2016)

Y.-H. Fu, C.-Y. Huang, C.-T. Sun, Using global diversity and local topology features to identify influential network...

A. Goyal et al.

Simpath: an efficient algorithm for influence maximization under the linear threshold model

H. Huang et al.

Community-based influence maximization for viral marketing

Appl. Intell.

(2019)

P. Huang, H. Liu, C. Chen, P. Cheng, The impact of social diversity and dynamic influence propagation for identifying...

Cited by (30)

A new community-based algorithm based on a “peak-slope-valley” structure for influence maximization on social networks
2023, Chaos, Solitons and Fractals
Influence Maximization (IM) is a key algorithmic problem that has been extensively studied in social influence analysis, but most of existing researches either make sacrifices in solution accuracy or suffer high computational complexity. In this paper, we propose a new Community-based Influence Maximization (CIM) algorithm for identifying a set of seed spreaders in a social network to maximize the expected number of influenced nodes. In CIM, the initial candidate seeds are first selected based on the proposed topological potential “peak-slope-valley” structure framework. Then, we propose a recursive clustering approach and a similarity indicator based on local resource allocation to partition communities. Finally, we design a community-based regional influence indicator to select seed nodes without using any prior knowledge. Experiment datasets include three artificial benchmarks with varying community strengths, as well as nine representative networks drawn from various fields. Extensive numerical simulations on both artificial and real networks indicate that (i) community-based techniques enrich the toolbox for addressing the IM problem and (ii) the derivative algorithm outperforms recent high-performing influence maximization algorithms in terms of influence propagation and coverage redundancy of the seed set with an acceptable complexity. Furthermore, our algorithm exhibits good stability on networks of varying scales and structural characteristics.
Identifying influential users in unknown social networks for adaptive incentive allocation under budget restriction
2023, Information Sciences
Citation Excerpt :
The influence maximization problem, a similar problem aiming to select a number of users from a social network with the maximum influence spread, has been investigated over the past ten years [10]. Although many studies have been conducted for solving the influence maximization problem from an algorithmic perspective [33–39], these works assume that the topology of a social network is explicitly given as input. However, knowledge about the network is usually unknown beforehand in many real-world applications and must be gathered via learning, observation, and survey [6,13].
In recent years, recommenze the social influence among users to enhance the effect of incentivization. Through incentivizing influential users directly, their followers in the social network are possibly incentivized indirectly. However, in many real-world applications, identifying influential users can be challenging because of the unknown network topology. In this paper, we propose a novel algorithm for exploring influential users in unknown networks, estimating the influential relationships among users based on their historical behaviors without knowing the network topology. In addition, we design an adaptive incentive allocation approach that determines incentive values based on each user’s preferences and influence ability. We evaluate the performance of the proposed approaches by conducting experiments on synthetic and real-world datasets. The experimental results demonstrate the effectiveness of the proposed approaches.
Targeted influence maximization in complex networks
2023, Physica D: Nonlinear Phenomena
Many real-world applications based on spreading processes in complex networks aim to deliver information to specific target nodes. However, it remains challenging to optimally select a set of spreaders to initiate the spreading process. In this paper, we study the targeted influence maximization problem using a susceptible–infected–recovered (SIR) model as an example. Formulated as a combinatorial optimization, the objective is to identify a given number of spreaders that can maximize the influence over target nodes while minimize the influence over non-target nodes. To find a practical solution to this optimization problem, we develop a theoretical framework based on a message passing process and perform a stability analysis on the equilibrium solution using non-backtracking (NB) matrices. Based on the optimal perturbation on the equilibrium solution, we further introduce a metric, termed targeted collective influence, for each node to identify influential spreaders for targeted spreading processes. The proposed method, validated in both synthetic and real-world networks, outperforms other competing heuristic approaches. Our results provide a framework for analyzing the targeted influence maximization problem and a practical method to identify spreaders in real-world applications.
TSIFIM: A three-stage iterative framework for influence maximization in complex networks
2023, Expert Systems with Applications
Citation Excerpt :
Based on network topology characteristic, Tang et al. (2020) proposed the discrete shuffled frog-leaping algorithm for solving the IM problem. Calio and Tagarelli (2021) put forward the ADITUM algorithm to determine the influential spreaders in complex networks, which disperses the seeds as much as possible according to the side-information available at node level, where the side-information corresponds to the categorical attribute values. Lotf et al. (2021) presented a dynamic generalized genetic algorithm to select a dynamic seed set in social networks, which uses soft computing to propose a method that takes into account time limitations, scalability, and network structures, and improves the speed and accuracy of recognizing influential individuals through reduction of computations.
The problem of influence maximization is a classic issue that has been well-studied in the field of network science, but most of existing researches are compromising among computational complexity or result accuracy. In this work, a three-stage iterative framework for influence maximization (TSIFIM) is presented to find a set of seed spreaders in complex networks. In TSIFIM, the initial candidate seeds are first selected by considering the global communicability of each node and its importance in their local network. Then, in addition to the candidate seeds, other remained nodes are assigned to the specific communities based on the proposed local resource allocation similarity index, and the core node in each community which satisfies the local influence threshold condition are selected as the supplementary candidate seeds. Furthermore, we employ an adaptive search strategy to find the optimal solution among these candidates. The proposed algorithm is compared with eight popular influence maximization algorithms on nine real-world networks to verify the performance. Experimental results show that TSIFIM has better performance in terms of influence spreading, sensitivity analysis, seed dispersion and statistical test.
Targeted influence maximization in competitive social networks
2023, Information Sciences
Advertising using the word-of-mouth effect is quite effective in promoting products. In the last decade, there has been intensive research studying the influence maximization problem in marketing. The problem of influence maximization aims to identify a small group of people in the social network as seeds such that eventually, they will trigger the largest influence spread or product adoption in the network. In practical scenarios of online marketing, it is common that there are competitions among similar products in the network and the promotion is targeted at specific groups of users. For instance, an event organizer disseminates an event ad on a social platform hoping to attract attention of the most number of local residents. Meanwhile, there are multiple competing events being promoted on the social platform. In this paper, we formulate such problem as Targeted Influence Maximization in Competitive social networks (TIMC). To model the influence diffusion, we combine the target nodes and competitive relationships into an independent cascade model. We propose a Reverse Reachable set-based Greedy (RRG) algorithm to solve the TIMC problem and theoretically proved its approximation ratio. We also design a pruning strategy to further speed up the performance of the proposed algorithm. Extensive experiments have confirmed the efficiency of the proposed RRG algorithm. We also find that the algorithm works particularly well for sparse large networks with strong competition.
Social influence source locating based on network sparsification and stratification
2022, Expert Systems with Applications
With the rapid growth of the internet, social networks provide an ideal platform for information exchange and propagation. Meanwhile, negative information, such as fake news, rumors, and computer viruses, often spread in social networks. To restrain the propagation of such negative information, we must find the sources of the negative influence. However, in real world applications, we usually only know the scope of the negative influence spreading and do not know who first propagates the negative influence. However, we can identify the sources of the negative influence based on the information of some observed nodes that are negatively influenced. We define this as the influencing source location problem. In this work, we present a network sparsification and stratification-based method to effectively locate multiple propagation sources using information from a few observed nodes. To reduce the complexity of the problem, we first sparsify the network by removing some edges that do not significantly impact the influence propagation to the observed nodes. We then define the stratified propagation graph where the nodes are divided into several levels according to their degrees and the paths leading to the observed nodes. We propose a method for constructing the stratified propagation graph and calculating the likelihoods of the nodes being the sources influencing the observed nodes. Then, k nodes with the maximum likelihoods are selected as the sources. Abundant experimental results show that the influence sources identified by the proposed method can influence more observed nodes at a more accurate time than other algorithms.

View all citing articles on Scopus

View full text

Attribute based diversification of seeds for targeted influence maximization

Abstract

Introduction

Section snippets

Related work

Problem statement

Monotone and submodular diversity functions for a set of categorical tuples

A RIS-based framework for the ADITUM problem

Data

Stage 1 – sensitivity of diversity functions

Conclusions

Credit authorship contribution statement

Declaration of Competing Interest

Expert Syst. Appl.

Inf. Process. Manage.

Inf. Contr.

Knowl.-Based Syst.

Inf. Sci.

Neurocomputing

Proc. IEEE Conf. on Computer Communications (INFOCOM)

Knowl.-Based Syst.

Physica A

Maximizing the diversity of exposure in a social network

Proc. IEEE Int. Conf. on Data Mining (ICDM)

Topology-driven diversity for targeted influence maximization with application to user engagement in social networks

IEEE Trans. Knowl. Data Eng.

Cores matter? An analysis of graph decomposition effects on influence maximization problems

Online topic-aware influence maximization

PVLDB

Scalable influence maximization for prevalent viral marketing in large-scale social networks

Scalable influence maximization in social networks under the linear threshold model

Proc. IEEE Int. Conf. on Data Mining (ICDM)

Elements of Information Theory

Credit distribution for influence maximization in online social networks with node features

J. Intell. Fuzzy Syst.

Simpath: an efficient algorithm for influence maximization under the linear threshold model

Community-based influence maximization for viral marketing

Appl. Intell.