当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simple and automated negative sampling for knowledge graph embedding
The VLDB Journal ( IF 4.2 ) Pub Date : 2021-01-28 , DOI: 10.1007/s00778-020-00640-7
Yongqi Zhang , Quanming Yao , Lei Chen

Negative sampling, which samples negative triplets from non-observed ones in knowledge graph (KG), is an essential step in KG embedding. Recently, generative adversarial network (GAN) has been introduced in negative sampling. By sampling negative triplets with large gradients, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, they make the original model more complex and harder to train. In this paper, motivated by the observation that negative triplets with large gradients are important but rare, we propose to directly keep track of them with the cache. In this way, our method acts as a “distilled” version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. However, how to sample from and update the cache are two critical questions. We propose to solve these issues by automated machine learning techniques. The automated version also covers GAN-based methods as special cases. Theoretical explanation of NSCaching is also provided, justifying the superior over fixed sampling scheme. Besides, we further extend NSCaching with skip-gram model for graph embedding. Finally, extensive experiments show that our method can gain significant improvements on various KG embedding models and the skip-gram model and outperforms the state-of-the-art negative sampling methods.



中文翻译:

用于知识图嵌入的简单自动负采样

负采样是从知识图(KG)中未观察到的三胞胎中采样负三胞胎的方法,是KG嵌入的重要步骤。最近,在负采样中引入了生成对抗网络(GAN)。通过对具有大梯度的负三元组进行采样,这些方法避免了梯度消失的问题,从而获得了更好的性能。但是,它们使原始模型更加复杂且难以训练。在本文中,由于观察到具有大梯度的负三元组很重要但很少见,我们建议使用缓存直接跟踪它们。这样,我们的方法就成为了以前基于GAN的方法的“提炼”版本,它不会浪费额外时间来适应负三联体的全部分布的训练时间。然而,如何从缓存中采样和更新是两个关键问题。我们建议通过自动机器学习技术解决这些问题。自动化版本还包括基于GAN的方法作为特殊情况。还提供了NSCaching的理论解释,证明了优于固定采样方案的合理性。除此之外,我们进一步扩展了带有跳过图模型的NSCaching进行图嵌入。最后,大量的实验表明,我们的方法可以在各种KG嵌入模型和skip-gram模型上获得显着改进,并且优于最新的负采样方法。证明优于固定采样方案。除此之外,我们进一步扩展了带有跳过图模型的NSCaching进行图嵌入。最后,大量的实验表明,我们的方法可以在各种KG嵌入模型和skip-gram模型上获得显着改进,并且优于最新的负采样方法。证明优于固定采样方案。除此之外,我们进一步扩展了带有跳过图模型的NSCaching进行图嵌入。最后,大量的实验表明,我们的方法可以在各种KG嵌入模型和skip-gram模型上获得显着改进,并且优于最新的负采样方法。

更新日期:2021-01-28
down
wechat
bug