Semi-supervised Network Embedding with Text Information,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-supervised Network Embedding with Text Information
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.patcog.2020.107347
Maoguo Gong , Chuanyu Yao , Yu Xie , Mingliang Xu

Abstract Network embedding plays a pivotal role in network analysis, due to the capability of encoding each node to a low-dimensional dense feature vector. However, most existing network embedding approaches only focus on preserving structural information in the network. The text features and category attributes of nodes are ignored, which are important to network analysis. In this paper, we propose an innovative semi-supervised network embedding (SNE) model integrating structural information, text features and category attributes into embedding vectors simultaneously. Specifically, we design a structure preserving module and a text representation module to capture the global structural information and the text features separately. Meanwhile, a label indicator matrix and a supervised loss are proposed for preserving category information and mapping nodes in the same class closer. We utilize stacked auto-encoders to explore the highly nonlinear characteristics of the network. By optimizing the reconstruction loss and the designed supervised loss jointly in the proposed semi-supervised model, the embedding vectors are finally learned. Extensive experiments on real-world datasets demonstrate that our method is superior to the state-of-the-art baselines in a variety of tasks, including visualization, node classification and clustering.

中文翻译：

嵌入文本信息的半监督网络

摘要网络嵌入在网络分析中起着举足轻重的作用，因为它能够将每个节点编码为一个低维密集特征向量。然而，大多数现有的网络嵌入方法只关注保留网络中的结构信息。节点的文本特征和类别属性被忽略，这对网络分析很重要。在本文中，我们提出了一种创新的半监督网络嵌入（SNE）模型，将结构信息、文本特征和类别属性同时集成到嵌入向量中。具体来说，我们设计了一个结构保存模块和一个文本表示模块来分别捕获全局结构信息和文本特征。同时，提出了一个标签指示矩阵和一个监督损失，用于保存类别信息，并将同一类中的节点映射得更近。我们利用堆叠自动编码器来探索网络的高度非线性特征。通过在提出的半监督模型中联合优化重建损失和设计的监督损失，最终学习到嵌入向量。对真实世界数据集的大量实验表明，我们的方法在各种任务中优于最先进的基线，包括可视化、节点分类和聚类。最终学习到嵌入向量。对真实世界数据集的大量实验表明，我们的方法在各种任务中优于最先进的基线，包括可视化、节点分类和聚类。嵌入向量最终被学习。对真实世界数据集的大量实验表明，我们的方法在各种任务中优于最先进的基线，包括可视化、节点分类和聚类。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11