当前位置: X-MOL 学术Language Dynamics and Change › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quantifying the dynamics of topical fluctuations in language
Language Dynamics and Change Pub Date : 2020-02-10 , DOI: 10.1163/22105832-01001200
Andres Karjus 1 , Richard A. Blythe 1 , Simon Kirby 1 , Kenny Smith 1
Affiliation  

The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora—the topical-cultural advection model—and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we use the model to show that the emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.



中文翻译:

量化语言话题波动的动态

大量历时性语料库的存在为越来越多的语言发展和意义变化的定量研究提供了动力。本研究的中心量是文本中语言元素的记号频率,其频率变化反映了元素的流行性或选择性适应性。但是,语料频率的变化可能有多种原因,包括纯粹的随机采样效果,或者由于语料库由当代媒体和小说文本组成,其中潜在的主题随着文化和社会政治趋势而起伏。在这项工作中,我们介绍了一个用于控制语料库中主题波动的简单模型-主题文化对流模型-并演示如何为字频随时间变化的变化提供可靠的基线。我们在跨越两个世纪的历时语料上以及经过仔细控制的人工语言更改场景下验证了该模型,然后使用它来校正历史时间序列中的主题波动。最后,我们使用该模型显示新单词的出现通常与趋势话题的出现相对应。这表明,由于词典子空间中日益增长的交流需求,出现了一些词汇创新,并且话题文化对流模型可用于对此进行量化。

更新日期:2020-02-10
down
wechat
bug