当前位置: X-MOL 学术Inf. Retrieval J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2018-08-11 , DOI: 10.1007/s10791-018-9340-3
Walid Shalaby , Wlodek Zadrozny , Hongxia Jin

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.

中文翻译:

超越单词嵌入:从大规模知识库中学习实体和概念表示

使用神经词嵌入的文本表示已在许多NLP应用程序中被证明是有效的。最近的研究使传统的单词嵌入模型适应于学习多词表达的向量(概念/实体)。但是,这些方法仅限于文本知识库(例如Wikipedia)。在本文中,我们提出了一种新颖而简单的技术,用于整合来自两个不同结构的大型知识库(Wikipedia和Probase)中有关概念的知识,以学习概念表示。我们采用高效的跳过语法模型来无缝地学习Wikipedia文本和Probase概念图中的知识。我们在以下两个任务上评估了概念嵌入模型:(1)类比推理,在语义类比方面,我们的技术水平达到了91%,(2)概念分类,其中我们在两个基准数据集上实现了最先进的性能,其中一个的分类精度达到100%,另一种达到98%。另外,我们提出了一个案例研究,以评估我们的模型在神经语义解析的无监督论证类型识别中。与依赖于地名词典和正则表达式的乏味且容易出错的方法相比,我们证明了我们无监督方法的竞争准确性以及其更好地泛化到词汇实体提及之外的能力。
更新日期:2018-08-11
down
wechat
bug