Document Network Projection in Pretrained Word Embedding Space,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Document Network Projection in Pretrained Word Embedding Space
arXiv - CS - Information Retrieval Pub Date : 2020-01-16 , DOI: arxiv-2001.05727
Antoine Gourru, Adrien Guille, Julien Velcin and Julien Jacques

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

中文翻译：

预训练词嵌入空间中的文档网络投影

我们提出了正则化线性嵌入（RLE），这是一种将链接文档（例如引文网络）集合投影到预训练词嵌入空间的新方法。除了文本内容之外，我们还利用成对相似性矩阵提供补充信息（例如，引文图中两个文档的网络接近度）。我们首先为每个文档建立一个简单的词向量平均值，然后我们使用相似性来改变这个平均表示。文档表示可以帮助解决许多信息检索任务，例如推荐、分类和聚类。我们证明我们的方法在节点分类和链接预测任务上优于或匹配现有的文档网络嵌入方法。此外，

更新日期：2020-01-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文