Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks
arXiv - CS - Digital Libraries Pub Date : 2020-08-30 , DOI: arxiv-2008.13099
Qingyun Sun, Hao Peng, Jianxin Li, Senzhang Wang, Xiangyu Dong, Liangxuan Zhao, Philip S. Yu and Lifang He

Name disambiguation aims to identify unique authors with the same name. Existing name disambiguation methods always exploit author attributes to enhance disambiguation results. However, some discriminative author attributes (e.g., email and affiliation) may change because of graduation or job-hopping, which will result in the separation of the same author's papers in digital libraries. Although these attributes may change, an author's co-authors and research topics do not change frequently with time, which means that papers within a period have similar text and relation information in the academic network. Inspired by this idea, we introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. We divided papers into small blocks based on discriminative author attributes and blocks of the same author will be merged according to pairwise classification results of MA-PairRNN. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. In addition to attribute and structure information, MA-PairRNN also exploits semantic information by meta-path and generates node representation in an inductive way, which is scalable to large graphs. Furthermore, a semantic-level attention mechanism is adopted to fuse multiple meta-path based representations. A Pseudo-Siamese network consisting of two RNNs takes two paper sequences in publication time order as input and outputs their similarity. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of training data and have better generalization ability across different research areas.

中文翻译：

大规模异构学术网络中名称消歧的成对学习

姓名消歧旨在识别具有相同姓名的唯一作者。现有的名称消歧方法总是利用作者属性来增强消歧结果。然而，一些具有歧视性的作者属性（例如，电子邮件和隶属关系）可能会因毕业或跳槽而发生变化，这将导致同一作者的论文在数字图书馆中分离。尽管这些属性可能会发生变化，但作者的共同作者和研究主题不会随时间频繁变化，这意味着同一时期内的论文在学术网络中具有相似的文本和关系信息。受这个想法的启发，我们引入了基于多视图注意力的成对递归神经网络（MA-PairRNN）来解决名称消歧问题。我们根据可区分的作者属性将论文分成小块，同一作者的块将根据 MA-PairRNN 的成对分类结果进行合并。MA-PairRNN 将异构图嵌入学习和成对相似度学习组合成一个框架。除了属性和结构信息，MA-PairRNN 还通过元路径利用语义信息，并以归纳方式生成节点表示，可扩展到大图。此外，采用语义级注意机制来融合多个基于元路径的表示。由两个 RNN 组成的 Pseudo-Siamese 网络以发表时间顺序的两个论文序列作为输入并输出它们的相似性。两个真实世界数据集的结果表明，我们的框架在名称消歧任务上具有显着且一致的性能改进。还证明了 MA-PairRNN 可以在少量训练数据下表现良好，并且在不同研究领域具有更好的泛化能力。

更新日期：2020-09-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文