当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks
Neural Processing Letters ( IF 3.1 ) Pub Date : 2020-07-04 , DOI: 10.1007/s11063-020-10289-6
Shirui Wang , Wenan Zhou , Qiang Zhou

The internal structural information of words has proven to be very effective for learning Chinese word embeddings. However, most previous attempts made a single form extraction of internal feature to learn representations, ignoring the comprehensive combination of such information. And they focused only on explicit feature of internal structures, even though these structures still have the implicit semantics of words. In this paper, we propose Radical and Stroke-enhanced Word Embeddings (RSWE), a novel method based on neural networks for learning Chinese word embeddings with joint guidance from semantic and morphological internal information. RSWE enables an embedding model to learn simultaneously from (1) implicit semantic information that is exploited from the radicals, and (2) stroke n-grams information that can be explicitly obtained from Chinese words. In the learning process, RSWE uses stroke n-grams to capture the local structural feature of words, and integrates the implicit information exploited from radicals to enhance the semantic of embeddings. Through this combination procedure, semantics of Chinese words are effectively transferred into the learned embeddings. We evaluate the effectiveness of RSWE on word similarity computation, word analogy reasoning, performance over dimensions, performance over learning corpus size, and named entity recognition tasks, the experimental results show that our model outperforms existing state-of-the-art approaches.



中文翻译:

基于神经网络的自由基笔划增强汉字嵌入

单词的内部结构信息已被证明对于学习中文单词嵌入非常有效。但是,大多数以前的尝试都是通过单形式提取内部特征来学习表示形式,而忽略了此类信息的全面组合。他们只关注内部结构的显式特征,即使这些结构仍然具有单词的隐含语义。在本文中,我们提出了“径向和笔划增强词嵌入”(RSWE),这是一种基于神经网络的新方法,用于在语义和形态内部信息的联合指导下学习汉语词嵌入。RSWE使嵌入模型能够同时从(1)根部利用的隐式语义信息中学习,(2)可以从中文单词中明确获得的笔画n元语法信息。在学习过程中,RSWE使用笔触n-gram来捕获单词的局部结构特征,并整合从部首中获得的隐式信息,以增强嵌入的语义。通过这种组合过程,汉字的语义被有效地转移到学习的嵌入中。我们评估了RSWE在单词相似度计算,单词类比推理,维度上的性能,学习语料库上的性能以及命名实体识别任务方面的有效性,实验结果表明我们的模型优于现有的最新方法。并整合了从部首获得的隐式信息,以增强嵌入的语义。通过这种组合过程,汉字的语义被有效地转移到学习的嵌入中。我们评估了RSWE在单词相似度计算,单词类比推理,维度上的性能,学习语料库上的性能以及命名实体识别任务方面的有效性,实验结果表明我们的模型优于现有的最新方法。并整合了从部首获得的隐式信息,以增强嵌入的语义。通过这种组合过程,有效地将汉字的语义转移到学习的嵌入中。我们评估了RSWE在词相似度计算,词类比推理,维度上的表现,学习语料库大小上的表现以及命名实体识别任务方面的有效性,实验结果表明,我们的模型优于现有的最新方法。

更新日期:2020-07-05
down
wechat
bug