Abstract

Network embedding that learns representations of network nodes plays a critical role in network analysis, since it enables many downstream learning tasks. Although various network embedding methods have been proposed, they are mainly designed for a single network scenario. This paper considers a “multiple network” scenario by studying the problem of fusing the node embeddings and incomplete attributes from two different networks. To address this problem, we propose to complement the incomplete attributes, so as to conduct data fusion via concatenation. Specifically, we first propose a simple inductive method, in which attributes are defined as a parametric function of the given node embedding vectors. We then propose its transductive variant by adaptively learning an adjacency graph to approximate the original network structure. Additionally, we also provide a light version of this transductive variant. Experimental results on four datasets demonstrate the superiority of our methods.

1. Introduction

Social network sites (SNSs, also commonly referred as social networking services) are online platforms which provide users with various features to facilitate digital social interaction and information sharing [1, 2]. Over three billion users are currently active on various SNSs (like Facebook, Twitter, and QQ), spending on average two hours daily. These wide and active SNSs naturally form an important part of the digital economy, making social network analysis [3, 4] become a hot research topic over the years.

Recently, network embedding [5], as a fundamental problem in network analysis, has aroused considerable research interest. Network embedding learns low-dimensional vector representations for network nodes. The learned vectorized representations, which preserve certain structural and content information of networks, can be easily combined with off-the-shelf learning algorithms for many social network analysis tasks such as node classification [6], link prediction [7], and diffusion prediction [8].

1.1. Problem

Although various network embedding methods have been proposed, they mainly focus on a single network scenario. In the era of big data, the related information from different networks should be fused together to facilitate applications. In this paper, we consider a “multiple network” scenario by studying the problem of fusing the node embeddings and incompleted attributes provided by two different networks.

As illustrated in Figure 1, this problem has practical importance. Imagine that you use Yelp (see Figure 1(a)), a popular review app, and try to get in your account there. Yelp allows you to sign in using your Facebook account. In addition, as the node (user) embeddings not only preserve certain characteristics of networks but also protect users’ privacy [9], Facebook may provide these embeddings to Yelp to facilitate its applications, e.g., cold-start recommendation. More importantly, as some Yelp users begin to write reviews, a very practical problem would arise: is it possible to fuse the original node embeddings provided by Facebook and the reviews provided by Yelp to get new user embeddings (illustrated in Figure 1(b))?

1.2. Challenge and Solution

Certainly, one fundamental challenge is the incompleteness of attributes, i.e., only a small part of nodes are further provided with attributes. This challenge is very common. As reported [10], the distribution of user activity tends to be long-tailed, suggesting most social media contents (like the reviews on Yelp) are actually written by a few active users. To address this, we propose to complement the incomplete attributes by defining attributes as a parametric function of the given node embedding vectors. This complement enables us to conduct data fusion via concatenation (illustrated in the bottom right corner of Figure 1(b)).

To obtain high-quality fusion results, we further propose a transductive method by adaptively learning an adjacency graph to approximate the original network structure. In particular, the adjacency graph is learned by jointly considering the given node embeddings and attribute knowledge. Additionally, we also provide a light version of the proposed transductive method. Specifically, for each node, this light version reduces its neighbor candidate set for efficient adjacency graph learning. We then conduct extensive experiments to verify the effectiveness of our methods.

In summary, our main contributions are as follows: (i)We study the problem of fusing node embeddings and incomplete attributes from two different networks. To our best knowledge, little work has addressed this problem(ii)We propose a very simple and effective inductive method based on the idea of attribute complement(iii)We further propose a transductive method POINTS and its light version POINTS, both of which could obtain superior performance(iv)The remainder of the paper is organized as follows. We review related work in Sect. 2 and formalize the problem in Sect. 3. We present our method in Sect. 4 and then provide some discussion in Sect. 6. We conduct experiments in Sect. 7. Finally, we end with a conclusion in Sect. 8

2.1. Network Embedding

Over the past few years, there has been a lot of interest in learning useful node embeddings (i.e., features) from large-scale networks automatically [5]. A representative work is DeepWalk [6] which performs random walk on a network to generate node sequences and then perform the skip-gram algorithm [11] on those sequences to achieve the embedding. Another well-known work is LINE which preserves both first-order proximity (i.e., the similarity between linked nodes) and second-order proximity (i.e., the similarity between the nodes with shared neighbors) of a network. In addition, researchers have also proposed some deep learning-based embedding models, such as SDNE [12] and GraphGAN [13]. Recently, lots of studies consider the network embedding with side information, such as node attributes. For example, by proving DeepWalk is equivalent to matrix factorization, the work in [14] presents text-associated DeepWalk (TADW). GraphSAGE [15] employs graph convolutional networks [16] to aggregate features among node neighborhood for network embedding. RSDNE [17] and RECT [18] further consider the problem of zero-shot graph embedding, i.e., the completely imbalanced label setting.

2.2. Data Fusion

Data fusion is the study of efficient methods for automatically transforming information from different sources and different points in time into a representation that provides effective support for human or intelligent systems. Data fusion has proved useful in many disciplines, as discussed in [19, 20]. For example, in bioinformatics, jointly analyzing multiple datasets describing different organisms improves the understanding of biological processes [21]. In information retrieval, fusing the retrieval results from multiple search engines would significantly improve the retrieval performance [22]. In biometric recognition systems, feature fusion could greatly improve the recognition performance [23]. We refer to [24, 25] for a comprehensive survey.

However, little previous work considers the fusion of incomplete data or network embedding data. Our work fills this gap.

3. Problem Statement

The studied problem is defined as follows. We are given the node embeddings of a network , where is the node number and the -th row of (denoted as ) is a -dimensional embedding vector of node . On the other hand, another network further provides the attributes of () nodes: , where is the attribute vector, and is the attribute feature number. Our goal is to fuse the given node embeddings and those incomplete attributes, so as to get the updated embeddings for all nodes. Note that different from existing network embedding methods, the original network structure is unknown in our problem.

4. The Proposed Method

4.1. Fusion via Attribute Complement

Since only a small part of nodes are further provided with attributes, we cannot directly fuse node embeddings and attributes. To address this problem, we adopt a very simple complement strategy: predicting the nonexist attributes. In particular, for each node which is further provided with attributes, we assume that its node embedding should have the ability to generate its attribute vector . The optimal generation function can be obtained by solving the following minimization problem: where is a loss function that measures the reconstruction error, such as squared loss or hinge loss.

By solving the problem in Eq. (1), we can obtain the generation function . Then, for a node with no attributes, we can predict its attributes by applying . This complement enables us to conduct data fusion via concatenation. More details and discussion about the concatenation strategy can be found in Sect. 6.2.

4.2. Transductive Attribute Prediction

The method formulated in Eq. (1) is inductive. In this section, we present a transductive method. Generally, transductive methods, which leverage the test data for model training, perform better than inductive methods [26]. For network embedding, classical transductive methods exploit all network nodes by preserving the inherent network structure in the embedding space, i.e., connected nodes tend to have similar embeddings [27, 28]. Although the original network structure is unknown, one can simply build a sparse adjacency graph (We use the term “graph” to describe the recovered network structure, as to avoid ambiguity with the original network structure.) to approximate it, i.e., when node is the -nearest neighbors of node in the given node embedding space, otherwise . This approximation can capture the intuition of transductive learning by the following cost term: where is a distance function, and (the -th row of matrix ) is the predicted attribute vector of node . The imposed constraint ensures the predicted attributes to be consistent with the known attributes.

The adjacency graph plays a crucial role in this kind of graph-based transductive learning methods [29, 30]. However, the matrix in Eq. (2) might not be the optimal adjacency graph. On the one hand, the original network information is only approximately described by the given node embeddings (i.e., ) which is built from. On the other hand, the construction of ignores the attribute information, i.e., similar (dissimilar) attributes indicate similarity (dissimilarity) between different nodes. In this paper, we solve this problem in an adaptive way. Specifically, we propose to learn by jointly considering the given node embeddings and attribute knowledge. This yields the following cost term: where is a vector with the -th element as (i.e., is the row vector of matrix ), 1 denotes a column vector with all entries equal to one, and and are two adjustable parameters. Intuitively, the first and second term of Eq. (3) measure how well the adjacency graph fits the attributes and the given node embeddings, respectively.

The unified model: POINTS with learning the attribute generation function (Eq. (1)) and adjacency graph (Eq. (3)), the proposed method is to solve the following optimization problem:

Since the key idea of this method is to learn the adjacency graph adaptively, we term our method as adaPtively netwOrk embeddIng aNd aTtribute fuSion (POINTS).

A Light Version of POINTS: for each node , to learn its optimal neighbors, POINTS needs to consider all nodes. This is very inefficient, as the network may be extremely large (some theoretical analysis can be found in Sect. 6.3). Therefore, we give a light version of POINTS (denoted as POINTS). In particular, we propose to build a candidate neighbor set (denoted as ) for each node, where () is the candidate neighbor number. Based on this idea, the light version POINTS is to solve the following optimization problem:

5. Optimization

The objective functions of POINTS (i.e., Eq. (4)) and POINTS (i.e., Eq. (5)) both contain 0/1 constraints, which might be difficult to solve by the conventional optimization tools. In this section, we develop efficient solutions for these two problems.

5.1. Optimization for POINTS

Before deriving the optimization algorithm, we need to specify the choice of functions in Eq. (4). For simplicity, we choose a linear model for , i.e., , where is the model parameter matrix. In addition, we adopt squared loss for and squared Euclidean distance for . As such, we can update the variables in Eq. (4) iteratively, as follows.

Update rule of and : by fixing the other variables, the partial derivative of w.r.t. is

Therefore, we can update as , where is the learning rate.

When the other variables are fixed, we can obtain the partial derivative of w.r.t. as

where , and is a diagonal matrix whose -th diagonal element is . Then, we can update as . After that, for each node with given attributes , we adjust its predicted attributes as , so as to satisfy the constraint in Eq. (4).

Update rule of : when the other variables are fixed, the original optimization problem reduces to

Input: The given node embeddings , the attribute information, learning rate , and parameters and ;
Output: The final fusion results;
1: Initialize , and ;
2: repeat
3: Update as .
4: Update as ;
5: Set ;
6: Update by solving problem (8);
7: until Convergence or a certain iteration;
8: Obtain the final fusion results via concatenation, as discussed in sect. 6.2
9: return The final fusion results.
Algorithm 1. POINTS.

As problem (8) is independent between different , we can instead to solve decoupled subproblems:

The optimal solution of problem (9) is (proved in Sect. 6.1) where set contains the top- nearest nodes to in the network “embedding-attribute” space, where the distance between node and is defined as .

We can iteratively update these three variables until convergence to obtain the final solution. After that, as discussed in Sect. 6.2, we can get the final fusion results by concatenation. For clarity, we summarize the complete fusion procedure in Alg. 1.

5.2. Optimization for POINTS

The optimization approach of POINTS is very similar to that of POINTS in Sect. 5.1. The only difference is that when updating as other variables are fixed, we only need to sort the nodes in (its neighbor candidate set) to get the top- nearest neighbors in the network embedding attribute space, so as to get the optimal solution of .

6. Algorithm Analysis

6.1. Optimization Algorithm Solving Problem (9)

Theorem 1. The optimal solution of problem (9) is Eq. (10).

Proof. By contradiction, suppose node has gotten its optimal neighbor set which contains a node not in ’s top- nearest nodes in the “node-attribute” space. For convenience, we use to denote the distance between nodes and in this space, i.e., . As such, there must exist a node which is one of ’s top- nearest nodes in this space. Then, we get . Considering our minimization problem (i.e., Eq. (9)), this inequation leads This indicates that is a better optimal solution than , a contradiction.
Actually, we can generalize the above proof to a more general case.

Theorem 2. Suppose node is close to than node . If the chosen distance function satisfies , the optimal solution of problem (9) is Eq. (10) (which adopts the distance function ).

Proof. This conclusion can be proved by replacing the squared Euclid distance function in the proof of Theorem 1 by .

6.2. Fusion Strategy

In this part, we will discuss how to conduct data fusion, based on the proposed attribute complement methods. The inductive method (described in Sect. 4.1) would learn a generation function . Then, for each node , we can predict its attribute vector as . For those two transductive methods (described in Sect. 4.2), we will directly obtain the predicted attribute vectors . As such, the attributes are completed for fusion. Specifically, we adopt a “trick” concatenation strategy: (1) if node has no attributes, we obtain its final fusion vector by concatenating and the predicted attribute vector ; (2) if node has attributes, we obtain its final fusion vector by concatenating and . The principle of this trick is that the given attributes are always more stable and accurate than the predicted attributes for node description.

6.3. Time Complexity

The time complexity of Alg. 1 is as below. The complexity for updating is . The complexity for updating is , where is the number of nonzeros of a matrix. The complexity for updating is , because for each node, we have to calculate its top- nearest neighbors. As is linear with and is linear with , the overall complexity of POINTS is , where is the number of iterations to converge.

For the light version, i.e., POINTS, the complexity of updating becomes , and all others remain the same. Hence, since , the overall complexity becomes . As our method usually converges fast (20 in our experiments) and , the complexity of POINTS is linear to the number of nodes.

7. Experiments

Datasets: we conduct our experiments on four widely used citation network datasets: Citeseer [31], Cora [31], Wiki [32], and Pubmed [32]. In these networks, nodes are documents, and edges denote the citation relationship between them. Node attributes (i.e., features) are the bag-of-word representations of documents. The statistic of these networks is shown in Table 1.

Experimental setting: as illustrated in Figure 1, for each dataset, we first obtain the original node embeddings and then provide some nodes with attributes for data fusion, so as to simulate fusing data from two different networks. Specifically, we first obtain the original node embeddings by the famous network embedding method LINE. We adopt its first-order proximity version LINE (1st). Besides, we also try other network embedding methods in Sect. 7.3. After obtaining the original node embeddings, we randomly select some nodes and provide them with attributes. At last, we employ different fusion methods to obtain the final fusion results.

Baseline methods: since this incomplete data fusion problem has not been previously studied, there is no natural baseline to compare with. We thus compare our methods with those methods which directly fuse the original given node embeddings and attributes. We list them as follows: (1)LINE(1st). We adopt LINE(1st) to obtain the original node embeddings. This method neglects the incomplete node attributes. Note: in Sect. 7.3, we also try more network embedding methods(2)Attributes. We use the zero-padded attributes as fusion results. This method neglects the given node embeddings(3)NaiveCombine. We simply concatenate the vectors from the given node embeddings and the zero-padded attributes

For our method, we test its three different versions: POINTSind (the inductive version formulated in Eq. (1)), POINTS (the full transductive version formulated in Eq. (4)), and POINTS (the light version formulated in Eq. (5)).

Parameters: we follow the suggestion of LINE to set the embedding dimension to 128. In addition, following [14], we reduce the dimension of attributes by applying SVD decomposition on the original text features. For simplicity, we also reduce this dimension to 128. In the proposed methods POINTS and POINTS, we fix parameters and throughout our experiments, although adjusting them would yield better results. Besides, we simply set the neighbor number like most graph-based transductive methods [33] and set the candidate number for POINTS.

7.1. Node Classification

Following [6], we train one-vs-rest logistic regression classifiers to evaluated the fusion (i.e., the updated embeddings) quality. Specifically, for Citeseer, Cora, and Wiki, we fix the label rate in the classifiers to 10%. Since Pubmed is a much larger dataset with fewer classes, we follow [34] to set the percentage of labeled data to 1%. In addition, we increase the rate of nodes with attributes from 10% to 90% on all datasets. Following [28], before evaluation, we normalize all representation vectors to unit length for a fair comparison. Figures 2 and 3 show the classification performance measured by micro-F1 and macro-F1 [35], respectively. We can draw the following three conclusions from these results.

Firstly, all our methods (including POINTSind, POINTS, and POINTS) outperform baseline methods significantly. For example, on Citeseer with 50% attributes, POINTSind, which performs worst in the proposed three methods, still outperforms LINE(1st) by 13%, attributes by 8%, and NaiveCombine by 3%. Additionally, the improvements of our two transductive methods POINTS and POINTS are more remarkable. These results clearly demonstrate the effectiveness of our complement strategy.

Secondly, the proposed two transductive methods (i.e., POINTS and POINTS) consistently outperform our inductive method POINTSind. Especially on Citeseer, Cora, and Pubmed, these two transductive methods generally outperform POINTSind by 5-12%. On the other hand, we also find that the improvement becomes less significant on Wiki. We conjecture that it may be hard to recover its original network structure from the given node embeddings and attributes. More specifically, this might be because Wiki (whose edge num is eight times greater than node number) is much denser than the other three datasets.

Thirdly, the light version POINTS is comparable to POINTS on all datasets. This indicates that we can reduce the neighbor candidate set size for efficient transductive learning.

7.2. Visualization

Following [28], we use t-SNE package [36] to visualize the final node representations obtained by different fusion methods. Without loss of generality, we choose the first dataset Citeseer and test the case with 50% attributes. Similar to [28], for a clear comparison, we visualize the nodes from three different research fields: IR, DB, and HCI. Figure 4 shows the visualization results.

As shown in Figures 4(a)4(c), the visualization results of the compared three baselines are not very meaningful, in which the points belonging to different categories are heavily mixed with each other. This is due to the fact that all these baselines cannot sufficiently utilize the incomplete attributes. In contrast, as shown in Figures 4(d)4(f), the results of our three methods are much better (nodes with same colors are distributed closer). In addition, compared to our inductive method POINTSind, our two transductive methods POINTS and POINTS show more meaningful layout. Specifically, the blue points in POINTSind are partly separated by the red points, while these two types of points in POINTS and POINTS are less mixed with each other. To clarify the reason, we further visualize the predicted attributes of POINTSind and POINTS in Figures 4(g) and 4(h), respectively. We can clearly find that POINTS could obtain high-quality attributes, which explains the superiority of our transductive methods. [21].

7.3. More Network Embedding Baselines

We evaluate the performance of our methods based on more network embedding methods. In particular, we further test another five network embedding methods as follows:

Without loss of generality, we fix the label rate to 10% and choose 50% nodes with attributes. For convenience, we use “OrigEmb” to denote the original node embeddings obtained by various network embedding methods.

Figure 5 shows the performance on Citeseer. We can clearly find that our methods (including POINTSind, POINTS, and POINTS) consistently outperform baselines by a large margin. On the other hand, the light version POINTS could always achieve similar accuracy as its full version POINTS. Taken together, all these observations clearly indicate the effectiveness of our methods.

8. Conclusion

This paper investigates the problem of fusing node embeddings and incompleted attributes provided by two different networks. We develop both inductive and transductive variants of our method. Additionally, we also provide an efficient light version of our transductive variant. Extensive experiments have demonstrated the effectiveness of our methods. In the further, we would extend our method to fuse more types of related information from more different networks and resources.

Data Availability

The datasets used in this paper can be found at https://linqs.soe.ucsc.edu/data.

Conflicts of Interest

The author(s) declare(s) that they have no conflicts of interest.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (No. 61902020) and Macao Youth Scholars Program (AM201912).