Automatic transcription of the Polish newsreel,Poznan Studies in Contemporary Linguistics

当前位置： X-MOL 学术 › Poznan Studies in Contemporary Linguistics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic transcription of the Polish newsreel
Poznan Studies in Contemporary Linguistics ( IF 0.5 ) Pub Date : 2019-06-26 , DOI: 10.1515/psicl-2019-0008
Danijel Koržinek , Krzysztof Wołk , Łukasz Brocki , Krzysztof Marasek

Abstract This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%.

中文翻译：

波兰新闻纸的自动抄写

摘要本文介绍了一种用于波兰Newsreel的自动转录系统，该系统是20世纪中后期以音频和视频形式呈现的新闻片段的集合。它们的特点是使用古老的语言和较差的音频质量，这使它们成为语音识别系统的严苛问题。必须使用域内语料库中的数据来重新训练声学和语言模型。在模型的适应过程中，进行了实验以选择最佳适应参数。实验表明，语音识别系统适应狭窄且清晰定义的域会显着提高其效率。针对该域获得的最终单词错误率是10.97％。

更新日期：2019-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文