当前位置: X-MOL 学术Journal of Language Evolution › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sequence comparison in computational historical linguistics
Journal of Language Evolution Pub Date : 2018-07-01 , DOI: 10.1093/jole/lzy006
Johann-Mattis List 1 , Mary Walworth 1 , Simon J Greenhill 1, 2 , Tiago Tresoldi 1 , Robert Forkel 1
Affiliation  

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.

中文翻译:

计算历史语言学中的序列比较

随着来自世界各地的数字可用数据量的增加,多语言单词列表中同源词的手动注释在历史语言学中变得越来越耗时。在手动分析之前使用可用的软件包对数据进行预处理可以大大加快同源检测的过程。此外,它使我们能够快速了解​​专家尚未深入研究的数据。LingPy 是一个 Python 库,它为历史语言学中的序列比较提供了大量的例程。使用LingPy,语言学家不仅可以自动搜索词法数据中的同源词,还可以将自动识别的词对齐,并以各种形式输出,旨在方便人工检查。在本教程中,我们将简要介绍 LingPy 采用的算法背后的基本概念,然后在具体的工作流程中说明如何将自动序列比较应用于多语言单词列表。目标是为读者提供他们需要的所有信息:(1) 在 LingPy 中进行同源检测和对齐分析,(2) 为适当的任务选择适当的算法,(3) 评估自动同源检测算法的性能如何比较专家,以及 (4) 将他们的数据导出为各种格式,以用于附加分析或数据共享。虽然 Python 语言的基础知识对所有分析都很有用,但我们的教程的结构使得具有计算基础知识的学者也可以完成所有步骤。
更新日期:2018-07-01
down
wechat
bug