Deep Collaborative Embedding for information cascade prediction☆
Introduction
In recent years, as more and more people enjoy the services provided by Facebook, Twitter, and Weibo, etc., information cascades have become ubiquitous in online social networks, which has motivated a huge amount of researches [1], [2], [3], [4], [5]. An important research topic is information cascade prediction, whose purpose is to predict who will be infected by a piece of information in the future [6], [7], [8], [9], where infection refers to the actions that users reshare (retweet) or comment a tweet, a photo, or other piece of information [10].
While lots of methods have been proposed for information cascade prediction [6], [11], [12], [13], [14], the existing works often suffer from three defects. First, the existing works often focus on predicting the probability that whether a node will be infected in the future given nodes infected in the past, but ignore the prediction of infection order, i.e., which nodes will be infected earlier or later than others. However, predicting the infection order is important in many scenarios. For example, it is helpful for blocking rumor spread to know who will be the next infected node [15], [16]. Second, the existing methods often assume that information diffusion follows a parametric model such as Independent Cascade (IC) model [17] and Susceptible–Infected (SI) model [18]. In real world, however, information diffusion processes are so complicated that we seldom exactly know the underlying mechanisms of how information diffuses [19]. At last, the existing works often assume that the explicit paths along which information propagates between nodes are observable. Yet in many scenarios we can only observe that nodes get infected but cannot know who infects them [12]. For example, in viral marketing, one can track whether a customer buys a product but it is difficult to exactly determine who influences her/him.
In this paper, we aim at the problem of information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. This is not easy due to the following two major challenges:
- •
Cascading Characteristics The probability that a node is infected by a cascade and the relative infection order mainly depend on its cascading characteristics that reveal its relation to other nodes in that cascade. The existing methods often just take into consideration the static structural properties of nodes, for example, the node neighborship in a static social network. However, the cascading characteristics of a node intuitively vary in different cascades, and different cascades can contain totally different infection ranges or orders of nodes. For example, in some cascades, one node may often get infected by certain nodes, but in other cascades, it may be more susceptible to different nodes, even though the node structural properties remain the same. Intuitively, different contents often lead to different cascading characteristics of a node and result in different underlying mechanisms in different cascades. However, in many situations it is not easy to recognize the content (i.e., what is diffused) and its underlying diffusion mechanism (i.e., why and how it is diffused). For example, we often do not know what virus is being propagated in a plague, but when and which nodes are infected can be observed. To make prediction for cascades in such situations, we have to explicitly model the observable cascading characteristics which arguably implicitly captures the effect of the unobservable content and underlying mechanism as well. Therefore, what cascading characteristics of nodes should be captured and how to capture them are crucial to our purpose.
- •
Cascading Non-linearity Information cascades are often non-linear. The non-linearity comes from two perspectives. One is the non-linearity of the dynamics of the information cascades, and the other is the non-linearity of the structure of the social networks on which cascades exist. The non-linearity will cause the problem when nodes spread the content of a cascade, they exhibit non-linear cascading patterns (e.g., emergence pattern) that the existing shallow models cannot effectively recognize. How to capture the non-linear features of nodes in information cascades is also a critical challenge for our problem.
Inspired by the impressive network representation learning ability of deep learning that has been demonstrated by the recent works [20], [21], [22], we propose a novel model called Deep Collaborative Embedding (DCE) for prediction of infection and infection order in cascades, which can learn the embeddings without assumption about the underlying diffusion model and diffusion networks. The main idea of DCE is to collaboratively embed the nodes with a deep architecture into a latent space where the closer the embeddings of the two nodes are, the more likely the two nodes will be infected in the same cascade and the closer their infection time will be.
Different from the traditional network embedding methods [20], [23], [24], [25], which mainly focus on preserving the static structural properties of nodes in a network, DCE can capture not only the node structural property but also two kinds of node cascading characteristics that are important for the prediction of node infection and infection order. One is the cascading context, which reveals the temporal relation of nodes in a cascade. The cascading context of one node consists of two aspects, including the potential influence it receives from earlier infected nodes and their temporal relative positions in a cascade. The other kind of cascading characteristic captured by DCE is the cascading affinity, which reveals the co-occurrence relation of nodes in cascades. Cascading affinity essentially reflects the probability that two nodes will be infected by the same cascade. Higher cascading affinity between two nodes indicates that it is more likely for them to co-occur in a cascade. Intuitively, the cascading characteristics of nodes reflect the effect of the unobservable underlying diffusion mechanisms and diffusion networks. Therefore, by explicitly preserving the node cascading characteristics, the learned embeddings also implicitly capture the effect of unobservable underlying diffusion mechanisms and diffusion network, which makes it feasible to make cascade predictions in terms of the similarity between embeddings in the latent space. As we will see later in the experiments, due to the ability to capture the cascading characteristics, the embeddings learned by DCE show a better performance in the task of infection prediction.
To effectively capture the non-linearity of information cascades, we introduce an auto-encoder based collaborative embedding architecture for DCE. DCE consists of multi-layer non-linear transformations by which the non-linear cascading patterns of nodes can be effectively encoded into the embeddings. DCE can learn embeddings for nodes in a collaborative way, where there are two kinds of collaborations, i.e., cascade collaboration and node collaboration. At first, in light of the observation that a node often participates in more than one cascade of different contents, for a node DCE can collaboratively encode its cascading context features in each cascade into its embedding. In other words, the embedding of a node is learned with the collaboration of the cascades the node participates, which we call the cascade collaboration. At the same time, DCE can concurrently embed the nodes, during which the embedding for a node is generated under the constraints of its relation to other nodes, i.e., its cascading affinity to other nodes and its neighborship in social networks. In other words, the embeddings of nodes are learned with the collaboration of each other, which we call the node collaboration.
The major contributions of this paper can be summarized as follows:
- 1.
We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. The node embeddings learned by DCE are beneficial to not only the infection prediction but also the prediction of infection order of nodes in a cascade.
- 2.
We propose an auto-encoder based collaborative embedding framework for DCE, which can collaboratively learn the node embeddings, preserving the node cascading characteristics including cascading context and cascading affinity, as well as the structural property.
- 3.
The extensive experiments conducted on real datasets verify the effectiveness of our proposed model.
The rest of this paper is organized as follows. We give the preliminaries in Section 2. The cascading context is defined and modeled in Section 3. In Section 4 we illustrate our proposed model and in Section 5 we analyze the experiments results. Finally, we briefly review the related work in Section 6 and conclude in Section 7.
Section snippets
Basic definitions
We denote a social network as , where is the nodes set comprising nodes and is the edges set. Let be the set of information cascades. An information cascade () observed on a social network is defined as a set of timestamped infections, i.e., , where represents node is infected by cascade at time . We also say if node participates in cascade . Additionally, we use to
Modeling cascading characteristics
Cascading characteristics of a node reveal its relation to other nodes in information cascades, which are crucial to the prediction of node infection and infection order. In this section, we will define two kinds of cascading characteristics, the cascading context and the cascading affinity, which will be encoded into the learning embeddings.
Deep collaborative embedding
In this paper, we propose an auto-encoder based Deep Collaborative Embedding (DCE) model, which can learn embeddings for nodes in a given social network, based on the cascades observed on the network, so that the learned embeddings can be used for cascade prediction without knowing the underlying diffusion mechanisms and the explicit diffusion networks. In this section, we first present the architecture of the Deep Collaborative Embedding (DCE) model in detail, and then we describe
Experiments
In this section, we will present the details of experiments conducted on real-world datasets. The experiments include two parts, the tuning of the hyper-parameters and the verifying of DCE. Particularly, to verify the effectiveness of DCE, we will check whether the embeddings learned by DCE improve the performance of the prediction of information cascades on the real world datasets.
Related work
In this section, we briefly review two lines of related works with our research, including network embedding and information cascade prediction.
Conclusions
In this paper, we address the problem of information cascade prediction in online social networks with the network embedding techniques. We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction which can learn embeddings for not only infection prediction but also infection order prediction in a cascade, without the requirement to know the underlying diffusion mechanisms and the diffusion network. We propose an auto-encoder based collaborative
CRediT authorship contribution statement
Yuhui Zhao: Conceptualization, Methodology, Software, Writing - original draft. Ning Yang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Tao Lin: Supervision. Philip S. Yu: Conceptualization, Methodology, Writing - original draft.
Acknowledgments
This work is supported by National Natural Science Foundation of China under grant 61972270, and in part by National Science Foundation under grants III-1526499, III-1763325, III-1909323, CNS-1930941, and CNS-1626432.
References (58)
- et al.
Heterogeneous anomaly detection in social diffusion with discriminative feature discovery
Inform. Sci.
(2018) - et al.
Predicting information diffusion probabilities in social networks: A Bayesian networks based approach
Knowl.-Based Syst.
(2017) - et al.
Containment of rumor spread in complex social networks
Inform. Sci.
(2020) - et al.
TPNE: Topology preserving network embedding
Inform. Sci.
(2019) - et al.
Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched word embeddings
Knowl.-Based Syst.
(2019) - et al.
Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering
Inform. Sci.
(2020) - et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020) - et al.
dyngraph2vec: Capturing network dynamics using dynamic graph representation learning
Knowl.-Based Syst.
(2020) - et al.
Network embedding by fusing multimodal contents and links
Knowl.-Based Syst.
(2019) - et al.
Can cascades be predicted?
DeepCas: An end-to-end predictor of information cascades
Collaborative inference of coexisting information diffusions
Influence maximization on social graphs: A survey
IEEE Trans. Knowl. Data Eng.
Prediction of information diffusion probabilities for independent cascade model
A predictive model for the temporal dynamics of information diffusion in online social networks
Topological recurrent neural network for diffusion prediction
A novel embedding method for information diffusion prediction in social network big data
IEEE Trans. Ind. Inf.
Learning social network embeddings for predicting information diffusion
Inferring networks of diffusion and influence
ACM Trans. Knowl. Discov. Data
Representation learning for information diffusion through social networks: an embedded cascade model
IAD: Interaction-aware diffusion framework in social networks
IEEE Trans. Knowl. Data Eng.
Information diffusion in online social networks:a survey
ACM SIGMOD Rec.
Talk of the network: A complex systems look at the underlying process of word-of-mouth
Mark. Lett.
The mathematical theory of infectious diseases and its applications
J. R. Stat. Soc. Ser. C. Appl. Stat.
Information-theoretic measures of influence based on content dynamics
Structural deep network embedding
Attributed social network embedding
IEEE Trans. Knowl. Data Eng.
Heterogeneous network embedding via deep architectures
LINE:Large-scale information network embedding
Cited by (27)
Social network node pricing based on graph autoencoder in data marketplaces
2024, Expert Systems with ApplicationsUser behavior prediction model based on implicit links and multi-type rumor messages
2023, Knowledge-Based SystemsA predictive model based on user awareness and multi-type rumors forwarding dynamics
2023, Information SciencesCitation Excerpt :Chen et al. [8] proposed a semi-supervised method, called Recurrent Cascades Convolutional Networks (CasCN), which explicitly models and predicts cascades through learning the latent representation of both structural and temporal information, without involving any other features. Zhao et al. [46] proposed an auto-encoder collaborative embedding framework to learn node embedding through cascade and node collaborations.Chen et al. [7] proposed a novel graph neural network-based model, called MUCas, to learn the latent representations of cascade graphs from a multi-scale perspective, which can make full use of the direction-scale, high-order-scale, position-scale, and dynamic-scale of cascades via a newly designed MUlti-scale Graph Capsule Network (MUG-Caps) and the influence-attention mechanism. The nonlinear relationships of rumor cascades can be effectively captured and used to construct features related to rumor contexts for application to different network rumor propagation analysis studies.
CasSeqGCN: Combining network structure and temporal sequence to predict information cascades
2022, Expert Systems with ApplicationsCitation Excerpt :Cao et al. (2020) applies specifically designed graph neural network models to capture the change of node state as well as the network structure. Zhao, Yang et al. (2020) uses structural property and the order of cascaded nodes to predict the future sequence of cascades. Xu et al. (2020) combines the representation of network and the time of retweet to predict the future spreading size.
Transformer-enhanced Hawkes process with decoupling training for information cascade prediction
2022, Knowledge-Based SystemsCitation Excerpt :Consequently, those particular downsides plus the increased computational burdens make these models inefficient. Several other deep learning techniques such as reinforcement learning [51], attention mechanism [20,52,53], and auto-encoder [26,54] are also used for information cascade prediction. Instead of solely depending on observed information cascade, the work [55] relies on parameters that fit previous similar cascades and infers new parameters accordingly.
HeDAN: Heterogeneous diffusion attention network for popularity prediction of online content
2022, Knowledge-Based SystemsCitation Excerpt :Therefore, the simultaneous consideration of all cascade samples can help to learn the interaction intimacy between users from their historical forwarding behaviors, which is helpful for information diffusion modeling. Among the methods [27,31] that co-process all cascades, Feng et al. [27] proposed two higher-order graphs with cascades as nodes, which were constructed based on the similarity between cascades, and learned the higher-order graphs by random walks and semi-supervised language models so that cascades with similar structure and content had closer representations. The idea of directly establishing the relationship between messages in that method is worth considering.
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105502.