当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications
arXiv - CS - Sound Pub Date : 2020-09-11 , DOI: arxiv-2009.05493
Adriana Stan

Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.

中文翻译:

RECOApy:用于端到端基于语音的应用程序的数据记录、预处理和语音转录

深度学习支持开发高效的端到端语音处理应用程序,同时绕过对专家语言和信号处理功能的需求。然而,最近的研究表明,高质量的语音资源和训练数据的语音转录可以增强这些应用程序的结果。本文介绍了RECOApy工具。RECOApy 简化了端到端基于语音的应用程序所需的数据记录和预处理步骤。该工具实现了一个易于使用的界面,用于提示语音记录、频谱图和波形分析、话语级别归一化和静音修剪,以及八种语言提示的字素到音素转换:捷克语、英语、法语、德语、意大利语、波兰语、罗马尼亚语和西班牙语。字素到音素 (G2P) 转换器是基于深度神经网络 (DNN) 的架构,在从维基词典在线协作资源中提取的词典上进行训练。由于不同程度的拼写透明度以及不同语言中不同数量的语音条目,DNN 的超参数通过进化策略进行了优化。呈现并讨论了由此产生的 G2P 转换器的音素和单词错误率。该工具、经过处理的语音词典和经过训练的 G2P 模型都是免费提供的。s 超参数使用进化策略进行优化。给出并讨论了生成的 G2P 转换器的音素和单词错误率。该工具、经过处理的语音词典和经过训练的 G2P 模型都是免费提供的。s 超参数使用进化策略进行优化。呈现并讨论了由此产生的 G2P 转换器的音素和单词错误率。该工具、经过处理的语音词典和经过训练的 G2P 模型都是免费提供的。
更新日期:2020-09-16
down
wechat
bug