当前位置: X-MOL 学术International Journal on Digital Libraries › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving semantic change analysis by combining word embeddings and word frequencies
International Journal on Digital Libraries ( IF 1.6 ) Pub Date : 2019-05-20 , DOI: 10.1007/s00799-019-00271-6
Adrian Englhardt , Jens Willkomm , Martin Schäler , Klemens Böhm

Language is constantly evolving. As part of diachronic linguistics, semantic change analysis examines how the meanings of words evolve over time. Such semantic awareness is important to retrieve content from digital libraries. Recent research on semantic change analysis relying on word embeddings has yielded significant improvements over previous work. However, a recent, but somewhat neglected observation so far is that the rate of semantic shift negatively correlates with word-usage frequency. In this article, we therefore propose SCAF, Semantic Change Analysis with Frequency. It abstracts from the concrete embeddings and includes word frequencies as an orthogonal feature. SCAF allows using different combinations of embedding type, optimization algorithm and alignment method. Additionally, we leverage existing approaches for time series analysis, by using change detection methods to identify semantic shifts. In an evaluation with a realistic setup, SCAF achieves better detection rates than prior approaches, 95% instead of 51%. On the Google Books Ngram data set, our approach detects both known and yet unknown shifts for popular words.



中文翻译:

通过结合词嵌入和词频来改进语义变化分析

语言在不断发展。作为历时语言学的一部分,语义变化分析检查单词的含义如何随时间演变。这种语义意识对于从数字图书馆检索内容很重要。依靠词嵌入的语义变化分析的最新研究已对以前的工作进行了重大改进。但是,到目前为止,最近但在某种程度上被忽略的观察是,语义转移的速率与词的使用频率负相关。因此,在本文中,我们提出了SCAF,即频度语义变化分析。它从具体的嵌入中抽象出来,并包含词频作为正交特征。SCAF允许使用嵌入类型,优化算法和对齐方法的不同组合。此外,我们利用更改检测方法来识别语义转移,从而利用现有方法进行时间序列分析。在具有实际设置的评估中,SCAF的检测率比以前的方法好,从95%而不是51%。在Google图书Ngram数据集上,我们的方法可以检测流行词的已知和未知转移。

更新日期:2019-05-20
down
wechat
bug