当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese
Journal of Quantitative Linguistics ( IF 0.7 ) Pub Date : 2021-05-19 , DOI: 10.1080/09296174.2021.1926110
Jin Cong 1
Affiliation  

ABSTRACT

The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.



中文翻译:

语境中词的 Zipfian 方法:现代英语和汉语的案例

摘要

语言的系统级复杂性已根据 Zipf 定律进行了深入研究,其量化特征已被证明反映了文本/语言类型学。本研究将 Zipf 定律的范围从语言的宏观尺度扩展到上下文中的特定单词,旨在检验其作为单词类型学指标的潜力。重点仅限于在 FLOB 和 LCMC 语料库中发现的中英文高频词。已经发现,所讨论的词的上下文词的对数-对数秩频率分布通常遵循线性函数y = ax+b。此外,已经表明参数a的调整版本可以帮助区分问题类别中的单词。这个基于 Zipf 的索引所反映的上下文信息可能对汉语词类的出现更为重要,它作为词类指标没有真正的屈折变化。从 Zipfian 的方法来看,研究结果初步认可了索绪尔关于语言符号的系统思考。同时,他们也可能为基于用法的语言学等领域做出贡献。

更新日期:2021-05-19
down
wechat
bug