当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The South African directory enquiries (SADE) name corpus
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2019-02-06 , DOI: 10.1007/s10579-019-09448-6
Jan W. F. Thirion , Charl van Heerden , Oluwapelumi Giwa , Marelie H. Davel

We present the design and development of a South African directory enquiries corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first-language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasks.

中文翻译:

南非目录查询(SADE)名称语料库

我们介绍南非目录查询语料库的设计和开发。它包含由南非语,南非语,英语,伊斯祖鲁语和塞索托语四种语言的母语使用者制作的各种南非名字的音频和正字转录。作为了解地名语言和说话者语言对发音的影响的有用资源,这是第一个旨在识别“目标语言”的语料库:这是对由该名称的说话者做出的词源的隐含假设。我们描述了语料库的设计,收集,注释和验证。这包括对用于用元信息标记语料库的算法的分析,这可能对语音建模任务有益。
更新日期:2019-02-06
down
wechat
bug