当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling multi-prototype Chinese word representation learning for word similarity
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2021-08-04 , DOI: 10.1007/s40747-021-00482-y
Fulian Yin 1 , Yanyan Wang 1 , Jianbo Liu 1 , Marco Tosato 2
Affiliation  

The word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.



中文翻译:

为词相似度建模多原型中文词表示学习

词相似度任务用于计算任意一对词的相似度,是自然语言处理(NLP)的一项基础技术。现有方法基于词嵌入,无法捕捉多义性,且受语料质量影响较大。在本文中,我们提出了一种基于同义词知识库的词相似度多原型中文词表示模型(MP-CWR),包括知识表示模块和词相似度模块。对于第一个模块,我们提出了一个双重注意力来结合语义信息来共同学习单词知识表示。MP-CWR 模型利用同义词作为先验知识来补充词之间的关系,有助于解决由于数据不足而导致的语义表达挑战。至于词相似度模块,我们为每个单词提出了一个多原型表示。然后我们计算并融合两个词的概念相似度以获得最终结果。最后,我们使用其他基线模型验证了我们的模型在三个公共数据集上的有效性。此外,实验还证明了我们的 MP-CWR 模型在不同语料库下的稳定性和可扩展性。

更新日期:2021-08-09
down
wechat
bug