Word Length Distribution in German Texts during the 17th-19th Century,Journal of Quantitative Linguistics

当前位置： X-MOL 学术 › Journal of Quantitative Linguistics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Word Length Distribution in German Texts during the 17th-19th Century
Journal of Quantitative Linguistics ( IF 0.761 ) Pub Date : 2019-09-15 , DOI: 10.1080/09296174.2019.1662536
Fei Lian ₁ , Yuan Li ₁

Affiliation

ABSTRACT

Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17^th and 19^th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.

中文翻译：

17-19世纪德国文字中的字长分布

摘要

在定量语言学领域中，德语文本中的单词长度一直是一个经常讨论的问题。然而，从现有研究数据的整体角度来看，大多数研究集中在文学文本和私人信件上，并且每个研究的数据语料库的规模都相对较小。本文使用基于时间和体裁的德语单词长度分布分析，使用起源于17^日和19^日之间的360种文本几个世纪以来，旨在找到一种从历时的角度可以很好地捕获德语单词长度分布的概率分布，并揭示单词长度分布与边界条件（例如体裁和文本的创建时间）之间的关系。结果表明，在不同时代编写的德语文本中的字长分布遵循1位移超泊松分布，其参数（a，b）与边界条件相互关联。这项研究证实，由于认知机制的限制，某种语言的单词长度分布是一致的。此外，概率分布的参数可以很好地指示写作风格以及文本的创建时间。

更新日期：2019-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>