Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning
arXiv - CS - Digital Libraries Pub Date : 2020-02-23 , DOI: arxiv-2002.09803
Haiwen Wang, Ruijie Wang, Chuan Wen, Shuhao Li, Yuting Jia, Weinan Zhang, Xinbing Wang

Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges on the network and to quantify the similarity among papers. However, the former requires adequate labeled samples and informative negative samples, and are also ineffective in measuring the high-order connections among papers, while the latter needs complicated feature engineering or supervision to construct the network. We propose a novel generative adversarial framework to grow the two categories of models together: (i) the discriminative module distinguishes whether two papers are from the same author, and (ii) the generative module selects possibly homogeneous papers directly from the heterogeneous information network, which eliminates the complicated feature engineering. In such a way, the discriminative module guides the generative module to select homogeneous papers, and the generative module generates high-quality negative samples to train the discriminative module to make it aware of high-order connections among papers. Furthermore, a self-training strategy for the discriminative module and a random walk based generating algorithm are designed to make the training stable and efficient. Extensive experiments on two real-world AND benchmarks demonstrate that our model provides significant performance improvement over the state-of-the-art methods.

中文翻译：

具有对抗性表示学习的异构信息网络上的作者姓名消歧

作者姓名歧义导致学术信息检索的不足和不便，这就提出了作者姓名消歧（AND）的必要性。现有的AND方法可以分为两类：关注内容信息以区分两篇论文是否由同一作者撰写的模型，关注关系信息以将信息表示为网络边缘并量化论文之间相似度的模型。然而，前者需要足够的标记样本和信息量丰富的负样本，并且在衡量论文之间的高阶联系方面也无效，而后者需要复杂的特征工程或监督来构建网络。我们提出了一种新颖的生成对抗框架来共同发展两类模型：(i) 判别模块区分两篇论文是否来自同一作者，以及 (ii) 生成模块直接从异构信息网络中选择可能同质的论文，从而消除了复杂的特征工程。这样，判别模块引导生成模块选择同质论文，生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。(ii) 生成模块直接从异构信息网络中选择可能同质的论文，从而消除了复杂的特征工程。这样，判别模块引导生成模块选择同质论文，生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。(ii) 生成模块直接从异构信息网络中选择可能同质的论文，从而消除了复杂的特征工程。这样，判别模块引导生成模块选择同质论文，生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。这消除了复杂的特征工程。这样，判别模块引导生成模块选择同质论文，生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。这消除了复杂的特征工程。这样，判别模块引导生成模块选择同质论文，生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。生成模块生成高质量的负样本来训练判别模块，使其意识到论文之间的高阶联系。此外，设计了判别模块的自训练策略和基于随机游走的生成算法，使训练稳定高效。在两个真实世界 AND 基准测试中的大量实验表明，我们的模型比最先进的方法提供了显着的性能改进。

更新日期：2020-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文