Systematic Homonym Detection and Replacement Based on Contextual Word Embedding,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Systematic Homonym Detection and Replacement Based on Contextual Word Embedding
Neural Processing Letters ( IF 3.1 ) Pub Date : 2020-10-20 , DOI: 10.1007/s11063-020-10376-8
Younghoon Lee

Homonyms are words that share their spelling but differ in meaning and are a common feature in most languages. Homonyms are a source of noise i most text analyses and are difficult to detect; numerous studies have been conducted in this regard. However, extant methods typically detect homonyms using a rule-based or statistical-based approach, which requires an answer set, with little regard to the semantic meaning of the word. Therefore, we propose a novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears. In this study, we extracted all contextual word embedding vectors of individual words and clustered those vectors using a spherical k-means clustering to detect pairs of homonyms. In addition, we developed a homonym replacement method to increase the performance of a document embedding technique, based on the word vector value. We replaced the embedding vectors of homonyms with a representative vector based on the respective meaning using the proposed homonym detection method. Experimental results indicate that the proposed method effectively detects homonyms and significantly improves the performance of document embedding.

中文翻译：

基于上下文词嵌入的系统同音异义词检测与替换

同音异义词是具有相同拼写但含义不同的单词，并且是大多数语言中的常见特征。在大多数文本分析中，同音异义词是噪音的来源，并且很难被发现；在这方面已经进行了许多研究。但是，现有方法通常使用基于规则或基于统计的方法来检测同音异义词，这需要答案集，而很少考虑单词的语义。因此，我们提出了一种基于上下文词嵌入的用于检测同音异义词的新颖方法，该方法允许根据单词出现的上下文来理解单词。在这项研究中，我们提取了单个单词的所有上下文单词嵌入向量，并使用球形k均值聚类对这些向量进行聚类以检测同义词对。此外，我们开发了一种基于单词向量值的同音异义替换方法，以提高文档嵌入技术的性能。我们使用提出的同音异义检测方法，根据各自的含义将同音异义的嵌入矢量替换为代表矢量。实验结果表明，该方法可以有效地检测同音异义词，并大大提高了文档嵌入的性能。

更新日期：2020-10-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>