当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Digitising Swiss German: how to process and study a polycentric spoken language
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2019-04-11 , DOI: 10.1007/s10579-019-09457-5
Yves Scherrer , Tanja Samardžić , Elvira Glaser

Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.

中文翻译:

瑞士德语数字化:如何处理和学习多中心口语

与其他标准语言的许多方言不同,德语的瑞士方言广泛用于日常交流中。尽管如此,瑞士德语的自动处理仍然是一个很大的挑战,因为它主要是一种口语品种,并且其地区差异很大。本文介绍了ArchiMob语料库,这是一种基于口述历史访谈而免费提供的通用的瑞士德语口语语料库。语料库是漫长的设计过程,大量的手工工作和经过特殊调整的计算处理的结果。我们首先介绍用于语言,历史和计算研究的语料库访问方式。然后,我们描述文档如何转录,分割和与声源对齐。这项工作涉及一系列实验,这些实验导致了自动注释归一化和词性标记层。最后,我们提出了一些案例研究,以激发将语料库用于一般的数字人文科学,尤其是用于方言学。
更新日期:2019-04-11
down
wechat
bug