当前位置: X-MOL 学术Chaos Solitons Fractals › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical metrics for languages classification: A case study of the Bible translations
Chaos, Solitons & Fractals ( IF 7.8 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.chaos.2021.110679
Ali Mehri , Maryam Jamaati

Automatic language classification is an important contribution to linguistic research. Four statistical features concerning long-range correlations are applied to classify syntactic properties of languages. We calculate Zipf’s exponent, Heaps’ exponent, fractal dimension and entropy, for the Bible translations to one hundred live languages from twenty-eight language families. The Bible has unique concept regardless of its language, but the discrepancy in grammatical rules of the languages leads to difference in extracted measures from its various translations. The results show that, geographical distance and cultural differences can lead to statistical discrepancies. All extracted features for the Bible translations have normal distribution around their average value. This fact categorizes the languages into two groups; a majority of normal languages and a minority of abnormal ones. There is also evident (anti)correlation relation between each pair of the mentioned metrics due to their respective mechanism. Standard deviation of the considered statistical features over language families is affected by geographical distance between communities that speak to their languages and their cultural diversity.



中文翻译:

语言分类的统计指标:以圣经翻译为例

自动语言分类是对语言研究的重要贡献。有关远距离相关性的四个统计特征被应用于对语言的句法属性进行分类。我们计算了Zipf的指数,Heaps的指数,分形维数和熵,用于将圣经翻译成来自28个语言家族的一百种现场语言。圣经无论其语言如何,都有其独特的概念,但是语言的语法规则上的差异导致从各种翻译中提取的量词有所不同。结果表明,地理距离和文化差异可能导致统计差异。圣经翻译的所有提取特征均在其平均值附近具有正态分布。这个事实将语言分为两类。大部分普通语言和少数异常语言。由于它们各自的机制,在每对提到的度量之间也存在明显的(反)相关关系。语言族之间所考虑的统计特征的标准差受说该语言的社区之间的地理距离及其文化多样性的影响。

更新日期:2021-01-22
down
wechat
bug