当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Zipfian regularities in “non-point” word representations
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-01-19 , DOI: 10.1016/j.ipm.2021.102493
Furkan Şahinuç , Aykut Koç

Being one of the most common empirical regularities, the Zipf’s law for word frequencies is a power law relation between word frequencies and frequency ranks of words. We quantitatively study semantic uncertainty of words through non-point distribution-based word embeddings and reveal the Zipfian regularities. Uncertainty of a word can increase due to polysemy, the word having “broad” meaning (such as the relation between broader emotion and narrower exasperation) or a combination of both. Variances of Gaussian embeddings are utilized to quantify the extent a word can be used in different senses or contexts. By using the variance information embedded in the non-point Gaussian embeddings, we quantitatively show that semantic breadth of words also exhibits Zipfian patterns, when polysemy is controlled. This outcome is complementary to Zipf’s law of meaning distribution and the related meaning-frequency law by indicating the existence of Zipfian patterns: more frequent words tend to be generic while less frequent ones tend to be specific. Results for two languages, English and Turkish that belong to different language families, are also provided. Such regularities provide valuable information to extract and understand relationships between semantic properties of words and word frequencies. In various applications, performance improvements can be obtained by employing these regularities. We also propose a method that leverages the Zipfian regularity to improve the performance of baseline textual entailment detection algorithms. To the best of our knowledge, our approach is the first quantitative study that uses Gaussian embeddings to examine the relationships between word frequencies and semantic breadth.



中文翻译:

“非点”字表示法中的Zipfian规则

作为最常见的经验规律之一,词频的齐普夫定律是词频与词频等级之间的幂律关系。我们通过基于非点分布的词嵌入对词的语义不确定性进行定量研究,并揭示Zipfian规律。一词的不确定性会由于多义性而增加,一词具有“广泛”的含义(例如宽泛的情感与较窄的愤怒之间的关系)或两者的组合。高斯嵌入的方差被用来量化一个单词可以在不同的意义或上下文中使用的程度。通过使用嵌入在非点高斯嵌入中的方差信息,我们定量地显示了当控制多义性时,单词的语义广度也表现出Zipfian模式。结果表明Zipfian模式的存在,从而补充了Zipf的意义分布定律和相关的意义-频率定律:频率较高的单词倾向于通用,而频率较低的单词倾向于特定。还提供了属于不同语言族的两种语言(英语和土耳其语)的结果。这样的规律性提供了有价值的信息,以提取和理解单词的语义特性与单词频率之间的关系。在各种应用中,可以通过采用这些规则来提高性能。我们还提出了一种利用Zipfian规则性来提高基线文本蕴含度检测算法性能的方法。据我们所知,我们的方法是第一个使用高斯嵌入检验词频和语义广度之间关系的定量研究。

更新日期:2021-01-19
down
wechat
bug