当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TEAGS: time-aware text embedding approach to generate subgraphs
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2020-06-03 , DOI: 10.1007/s10618-020-00688-7
Saeid Hosseini , Saeed Najafipour , Ngai-Man Cheung , Hongzhi Yin , Mohammad Reza Kangavari , Xiaofang Zhou

Contagions (e.g. virus and gossip) spread over the nodes in propagation graphs. We can use temporal-textual contents of nodes to compute the edge weights and generate subgraphs with highly relevant nodes. This is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between each pair of nodes may change by time. Second, not always the same contagion propagates. Hence, current text mining approaches including topic-modeling cannot effectively compute the edge weights. Third, since the propagation is affected by time, the word–word co-occurrence patterns may differ in various temporal dimensions which adversely impacts the performance of word embedding approaches. We argue that multi-aspect temporal dimensions (hour, day, etc) should be considered to better calculate the correlation weights between the nodes. In this work, we devise a novel framework that on the one hand, integrates a time-aware word embedding component to construct the word vectors through multiple temporal facets, and on the other hand, uses a time-only multi-facet generative model to compute the weights. Subsequently, we propose a Max-Heap Graph cutting algorithm to generate subgraphs. We validate our model through experiments on real-world datasets. The results show that our model can generate the subgraphs more effective than other rivals and temporal dynamics must be adhered in the modeling of the dynamical processes.

中文翻译:

TEAGS:用于生成子图的时间感知文本嵌入方法

感染(例如病毒和八卦)传播到传播图中的节点上。我们可以使用节点的时态文本内容来计算边缘权重并生成具有高度相关性的节点的子图。这对许多应用都是有益的。然而,挑战无处不在。首先,每对节点之间的传播模式可能会随时间变化。其次,并非总是传播相同的传染病。因此,当前的包括主题建模的文本挖掘方法无法有效地计算边缘权重。第三,由于传播受时间的影响,单词-单词共现模式可能在各个时间维度上有所不同,从而对单词嵌入方法的性能产生不利影响。我们认为,应考虑多方面的时间维度(小时,天等),以更好地计算节点之间的相关权重。在这项工作中,我们设计了一个新颖的框架,该框架一方面集成了时间感知单词嵌入组件,以通过多个时间方面构建单词矢量,另一方面,使用了仅时间的多方面生成模型来计算权重。随后,我们提出了Max-Heap Graph裁剪算法来生成子图。我们通过对真实数据集进行实验来验证我们的模型。结果表明,我们的模型可以比其他竞争对手更有效地生成子图,并且在动力学过程的建模中必须遵守时间动力学。随后,我们提出了Max-Heap Graph裁剪算法来生成子图。我们通过对真实数据集进行实验来验证我们的模型。结果表明,我们的模型可以比其他竞争对手更有效地生成子图,并且在动力学过程的建模中必须遵守时间动力学。随后,我们提出了Max-Heap Graph裁剪算法来生成子图。我们通过对真实数据集进行实验来验证我们的模型。结果表明,我们的模型可以比其他竞争对手更有效地生成子图,并且在动力学过程的建模中必须遵守时间动力学。
更新日期:2020-06-03
down
wechat
bug