Elsevier

Knowledge-Based Systems

Volume 193, 6 April 2020, 105502
Knowledge-Based Systems

Deep Collaborative Embedding for information cascade prediction

https://doi.org/10.1016/j.knosys.2020.105502Get rights and content

Abstract

Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach.

Introduction

In recent years, as more and more people enjoy the services provided by Facebook, Twitter, and Weibo, etc., information cascades have become ubiquitous in online social networks, which has motivated a huge amount of researches [1], [2], [3], [4], [5]. An important research topic is information cascade prediction, whose purpose is to predict who will be infected by a piece of information in the future [6], [7], [8], [9], where infection refers to the actions that users reshare (retweet) or comment a tweet, a photo, or other piece of information [10].

While lots of methods have been proposed for information cascade prediction [6], [11], [12], [13], [14], the existing works often suffer from three defects. First, the existing works often focus on predicting the probability that whether a node will be infected in the future given nodes infected in the past, but ignore the prediction of infection order, i.e., which nodes will be infected earlier or later than others. However, predicting the infection order is important in many scenarios. For example, it is helpful for blocking rumor spread to know who will be the next infected node [15], [16]. Second, the existing methods often assume that information diffusion follows a parametric model such as Independent Cascade (IC) model [17] and Susceptible–Infected (SI) model [18]. In real world, however, information diffusion processes are so complicated that we seldom exactly know the underlying mechanisms of how information diffuses [19]. At last, the existing works often assume that the explicit paths along which information propagates between nodes are observable. Yet in many scenarios we can only observe that nodes get infected but cannot know who infects them [12]. For example, in viral marketing, one can track whether a customer buys a product but it is difficult to exactly determine who influences her/him.

In this paper, we aim at the problem of information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. This is not easy due to the following two major challenges:

  • Cascading Characteristics The probability that a node is infected by a cascade and the relative infection order mainly depend on its cascading characteristics that reveal its relation to other nodes in that cascade. The existing methods often just take into consideration the static structural properties of nodes, for example, the node neighborship in a static social network. However, the cascading characteristics of a node intuitively vary in different cascades, and different cascades can contain totally different infection ranges or orders of nodes. For example, in some cascades, one node may often get infected by certain nodes, but in other cascades, it may be more susceptible to different nodes, even though the node structural properties remain the same. Intuitively, different contents often lead to different cascading characteristics of a node and result in different underlying mechanisms in different cascades. However, in many situations it is not easy to recognize the content (i.e., what is diffused) and its underlying diffusion mechanism (i.e., why and how it is diffused). For example, we often do not know what virus is being propagated in a plague, but when and which nodes are infected can be observed. To make prediction for cascades in such situations, we have to explicitly model the observable cascading characteristics which arguably implicitly captures the effect of the unobservable content and underlying mechanism as well. Therefore, what cascading characteristics of nodes should be captured and how to capture them are crucial to our purpose.

  • Cascading Non-linearity Information cascades are often non-linear. The non-linearity comes from two perspectives. One is the non-linearity of the dynamics of the information cascades, and the other is the non-linearity of the structure of the social networks on which cascades exist. The non-linearity will cause the problem when nodes spread the content of a cascade, they exhibit non-linear cascading patterns (e.g., emergence pattern) that the existing shallow models cannot effectively recognize. How to capture the non-linear features of nodes in information cascades is also a critical challenge for our problem.

Inspired by the impressive network representation learning ability of deep learning that has been demonstrated by the recent works [20], [21], [22], we propose a novel model called Deep Collaborative Embedding (DCE) for prediction of infection and infection order in cascades, which can learn the embeddings without assumption about the underlying diffusion model and diffusion networks. The main idea of DCE is to collaboratively embed the nodes with a deep architecture into a latent space where the closer the embeddings of the two nodes are, the more likely the two nodes will be infected in the same cascade and the closer their infection time will be.

Different from the traditional network embedding methods [20], [23], [24], [25], which mainly focus on preserving the static structural properties of nodes in a network, DCE can capture not only the node structural property but also two kinds of node cascading characteristics that are important for the prediction of node infection and infection order. One is the cascading context, which reveals the temporal relation of nodes in a cascade. The cascading context of one node consists of two aspects, including the potential influence it receives from earlier infected nodes and their temporal relative positions in a cascade. The other kind of cascading characteristic captured by DCE is the cascading affinity, which reveals the co-occurrence relation of nodes in cascades. Cascading affinity essentially reflects the probability that two nodes will be infected by the same cascade. Higher cascading affinity between two nodes indicates that it is more likely for them to co-occur in a cascade. Intuitively, the cascading characteristics of nodes reflect the effect of the unobservable underlying diffusion mechanisms and diffusion networks. Therefore, by explicitly preserving the node cascading characteristics, the learned embeddings also implicitly capture the effect of unobservable underlying diffusion mechanisms and diffusion network, which makes it feasible to make cascade predictions in terms of the similarity between embeddings in the latent space. As we will see later in the experiments, due to the ability to capture the cascading characteristics, the embeddings learned by DCE show a better performance in the task of infection prediction.

To effectively capture the non-linearity of information cascades, we introduce an auto-encoder based collaborative embedding architecture for DCE. DCE consists of multi-layer non-linear transformations by which the non-linear cascading patterns of nodes can be effectively encoded into the embeddings. DCE can learn embeddings for nodes in a collaborative way, where there are two kinds of collaborations, i.e., cascade collaboration and node collaboration. At first, in light of the observation that a node often participates in more than one cascade of different contents, for a node DCE can collaboratively encode its cascading context features in each cascade into its embedding. In other words, the embedding of a node is learned with the collaboration of the cascades the node participates, which we call the cascade collaboration. At the same time, DCE can concurrently embed the nodes, during which the embedding for a node is generated under the constraints of its relation to other nodes, i.e., its cascading affinity to other nodes and its neighborship in social networks. In other words, the embeddings of nodes are learned with the collaboration of each other, which we call the node collaboration.

The major contributions of this paper can be summarized as follows:

  • 1.

    We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. The node embeddings learned by DCE are beneficial to not only the infection prediction but also the prediction of infection order of nodes in a cascade.

  • 2.

    We propose an auto-encoder based collaborative embedding framework for DCE, which can collaboratively learn the node embeddings, preserving the node cascading characteristics including cascading context and cascading affinity, as well as the structural property.

  • 3.

    The extensive experiments conducted on real datasets verify the effectiveness of our proposed model.

The rest of this paper is organized as follows. We give the preliminaries in Section 2. The cascading context is defined and modeled in Section 3. In Section 4 we illustrate our proposed model and in Section 5 we analyze the experiments results. Finally, we briefly review the related work in Section 6 and conclude in Section 7.

Section snippets

Basic definitions

We denote a social network as G=(V,E), where V is the nodes set comprising N nodes and EV×V is the edges set. Let C={C1,C2,,CM} be the set of M information cascades. An information cascade Cm (1mM) observed on a social network G is defined as a set of timestamped infections, i.e., Cm=(v,tv(m))|vVtv(m)<, where (v,tv(m)) represents node v is infected by cascade Cm at time tv(m). We also say viCm if node vi participates in cascade Cm. Additionally, we use Cm(t)={(v,tv(m))|vVtv(m)<t} to

Modeling cascading characteristics

Cascading characteristics of a node reveal its relation to other nodes in information cascades, which are crucial to the prediction of node infection and infection order. In this section, we will define two kinds of cascading characteristics, the cascading context and the cascading affinity, which will be encoded into the learning embeddings.

Deep collaborative embedding

In this paper, we propose an auto-encoder based Deep Collaborative Embedding (DCE) model, which can learn embeddings for nodes in a given social network, based on the M cascades C1,,CM observed on the network, so that the learned embeddings can be used for cascade prediction without knowing the underlying diffusion mechanisms and the explicit diffusion networks. In this section, we first present the architecture of the Deep Collaborative Embedding (DCE) model in detail, and then we describe

Experiments

In this section, we will present the details of experiments conducted on real-world datasets. The experiments include two parts, the tuning of the hyper-parameters and the verifying of DCE. Particularly, to verify the effectiveness of DCE, we will check whether the embeddings learned by DCE improve the performance of the prediction of information cascades on the real world datasets.

Related work

In this section, we briefly review two lines of related works with our research, including network embedding and information cascade prediction.

Conclusions

In this paper, we address the problem of information cascade prediction in online social networks with the network embedding techniques. We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction which can learn embeddings for not only infection prediction but also infection order prediction in a cascade, without the requirement to know the underlying diffusion mechanisms and the diffusion network. We propose an auto-encoder based collaborative

CRediT authorship contribution statement

Yuhui Zhao: Conceptualization, Methodology, Software, Writing - original draft. Ning Yang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Tao Lin: Supervision. Philip S. Yu: Conceptualization, Methodology, Writing - original draft.

Acknowledgments

This work is supported by National Natural Science Foundation of China under grant 61972270, and in part by National Science Foundation under grants III-1526499, III-1763325, III-1909323, CNS-1930941, and CNS-1626432.

References (58)

  • LiC. et al.

    DeepCas: An end-to-end predictor of information cascades

  • SunY. et al.

    Collaborative inference of coexisting information diffusions

  • LiY. et al.

    Influence maximization on social graphs: A survey

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • SaitoK. et al.

    Prediction of information diffusion probabilities for independent cascade model

  • GuilleA. et al.

    A predictive model for the temporal dynamics of information diffusion in online social networks

  • WangJ. et al.

    Topological recurrent neural network for diffusion prediction

  • GaoS. et al.

    A novel embedding method for information diffusion prediction in social network big data

    IEEE Trans. Ind. Inf.

    (2017)
  • BourigaultS. et al.

    Learning social network embeddings for predicting information diffusion

  • Gomez-RodriguezM. et al.

    Inferring networks of diffusion and influence

    ACM Trans. Knowl. Discov. Data

    (2011)
  • BourigaultS. et al.

    Representation learning for information diffusion through social networks: an embedded cascade model

  • ZhangX. et al.

    IAD: Interaction-aware diffusion framework in social networks

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • GuilleA. et al.

    Information diffusion in online social networks:a survey

    ACM SIGMOD Rec.

    (2013)
  • GoldenbergJ. et al.

    Talk of the network: A complex systems look at the underlying process of word-of-mouth

    Mark. Lett.

    (2001)
  • RadcliffeJ.

    The mathematical theory of infectious diseases and its applications

    J. R. Stat. Soc. Ser. C. Appl. Stat.

    (1977)
  • SteegG.V. et al.

    Information-theoretic measures of influence based on content dynamics

  • WangD. et al.

    Structural deep network embedding

  • LiaoL. et al.

    Attributed social network embedding

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • ChangS. et al.

    Heterogeneous network embedding via deep architectures

  • TangJ. et al.

    LINE:Large-scale information network embedding

  • Cited by (27)

    • A predictive model based on user awareness and multi-type rumors forwarding dynamics

      2023, Information Sciences
      Citation Excerpt :

      Chen et al. [8] proposed a semi-supervised method, called Recurrent Cascades Convolutional Networks (CasCN), which explicitly models and predicts cascades through learning the latent representation of both structural and temporal information, without involving any other features. Zhao et al. [46] proposed an auto-encoder collaborative embedding framework to learn node embedding through cascade and node collaborations.Chen et al. [7] proposed a novel graph neural network-based model, called MUCas, to learn the latent representations of cascade graphs from a multi-scale perspective, which can make full use of the direction-scale, high-order-scale, position-scale, and dynamic-scale of cascades via a newly designed MUlti-scale Graph Capsule Network (MUG-Caps) and the influence-attention mechanism. The nonlinear relationships of rumor cascades can be effectively captured and used to construct features related to rumor contexts for application to different network rumor propagation analysis studies.

    • CasSeqGCN: Combining network structure and temporal sequence to predict information cascades

      2022, Expert Systems with Applications
      Citation Excerpt :

      Cao et al. (2020) applies specifically designed graph neural network models to capture the change of node state as well as the network structure. Zhao, Yang et al. (2020) uses structural property and the order of cascaded nodes to predict the future sequence of cascades. Xu et al. (2020) combines the representation of network and the time of retweet to predict the future spreading size.

    • Transformer-enhanced Hawkes process with decoupling training for information cascade prediction

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Consequently, those particular downsides plus the increased computational burdens make these models inefficient. Several other deep learning techniques such as reinforcement learning [51], attention mechanism [20,52,53], and auto-encoder [26,54] are also used for information cascade prediction. Instead of solely depending on observed information cascade, the work [55] relies on parameters that fit previous similar cascades and infers new parameters accordingly.

    • HeDAN: Heterogeneous diffusion attention network for popularity prediction of online content

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Therefore, the simultaneous consideration of all cascade samples can help to learn the interaction intimacy between users from their historical forwarding behaviors, which is helpful for information diffusion modeling. Among the methods [27,31] that co-process all cascades, Feng et al. [27] proposed two higher-order graphs with cascades as nodes, which were constructed based on the similarity between cascades, and learned the higher-order graphs by random walks and semi-supervised language models so that cascades with similar structure and content had closer representations. The idea of directly establishing the relationship between messages in that method is worth considering.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105502.

    View full text