当前位置: X-MOL 学术EPJ Data Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Susceptible-infected-spreading-based network embedding in static and temporal networks
EPJ Data Science ( IF 3.0 ) Pub Date : 2020-10-16 , DOI: 10.1140/epjds/s13688-020-00248-5
Xiu-Xiu Zhan , Ziyu Li , Naoki Masuda , Petter Holme , Huijuan Wang

Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.



中文翻译:

基于易感染传播的网络嵌入静态和临时网络

链路预测可用于提取丢失的信息,识别虚假交互以及预测网络发展。网络嵌入是一种将坐标分配给低维向量空间中的节点的方法。通过将节点嵌入向量中,可以将链接预测问题转换为相似度比较任务。具有相似嵌入矢量的节点更可能被连接。经典的网络嵌入算法是基于随机游走的。他们通过随机游走对轨迹路径进行采样,并从轨迹路径生成节点对。节点对集还用作Skip-Gram模型的输入,Skip-Gram模型是将节点(被视为单词)嵌入向量中的代表性语言模型。在本研究中,我们建议用散布过程代替随机游走过程,即易感感染(SI)模型来采样路径。具体来说,我们提出了两种基于易感染传播的算法,即小号usceptible-nfected Ñ etwork ë mbedding(SINE静态网络和)Ť emporal小号usceptible-nfected Ñ etwork ë mbedding(TSINE于时间网络)。与最新的静态和时间网络嵌入算法相比,缺少链接预测任务可以评估我们算法的性能。结果表明,在所有六个经验数据集中,SINETSINE均优于基线。我们进一步发现SINE的性能大多优于TSINE,表明时间信息未必会改善针对缺失链接预测的嵌入。此外,我们研究了量化为轨迹路径总长度的采样大小对嵌入算​​法性能的影响。与基线算法相比,SINETSINE的更好性能需要更小的采样大小。因此,基于SI扩展的嵌入倾向于更适用于大规模网络。

更新日期:2020-10-19
down
wechat
bug