当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery
arXiv - CS - Sound Pub Date : 2021-05-04 , DOI: arxiv-2105.01786 Thomas Glarner, Janek Ebbers, Reinhold Häb-Umbach
arXiv - CS - Sound Pub Date : 2021-05-04 , DOI: arxiv-2105.01786 Thomas Glarner, Janek Ebbers, Reinhold Häb-Umbach
Discovering speaker independent acoustic units purely from spoken input is
known to be a hard problem. In this work we propose an unsupervised speaker
normalization technique prior to unit discovery. It is based on separating
speaker related from content induced variations in a speech signal with an
adversarial contrastive predictive coding approach. This technique does neither
require transcribed speech nor speaker labels, and, furthermore, can be trained
in a multilingual fashion, thus achieving speaker normalization even if only
few unlabeled data is available from the target language. The speaker
normalization is done by mapping all utterances to a medoid style which is
representative for the whole database. We demonstrate the effectiveness of the
approach by conducting acoustic unit discovery with a hidden Markov model
variational autoencoder noting, however, that the proposed speaker
normalization can serve as a front end to any unit discovery system.
Experiments on English, Yoruba and Mboshi show improvements compared to using
non-normalized input.
中文翻译:
基于语音转换的说话人归一化用于声学单元发现
仅仅从语音输入中发现与扬声器无关的声学单元是一个难题。在这项工作中,我们提出了在单元发现之前的无监督说话人归一化技术。它基于使用对抗性对比预测编码方法将说话人相关内容与语音信号中内容引起的变化分离开来。该技术既不需要转录语音也不需要说话者标签,并且可以以多语言方式进行训练,因此即使目标语言中只有很少的未标记数据也可以实现说话者标准化。通过将所有话语映射到代表整个数据库的medoid风格来完成说话人归一化。我们通过使用隐马尔可夫模型变分自编码器进行声学单元发现来证明该方法的有效性,但是,所提出的说话人归一化可以用作任何单元发现系统的前端。与非标准化输入相比,使用英语,约鲁巴语和姆博希语的实验显示出了改进。
更新日期:2021-05-06
中文翻译:
基于语音转换的说话人归一化用于声学单元发现
仅仅从语音输入中发现与扬声器无关的声学单元是一个难题。在这项工作中,我们提出了在单元发现之前的无监督说话人归一化技术。它基于使用对抗性对比预测编码方法将说话人相关内容与语音信号中内容引起的变化分离开来。该技术既不需要转录语音也不需要说话者标签,并且可以以多语言方式进行训练,因此即使目标语言中只有很少的未标记数据也可以实现说话者标准化。通过将所有话语映射到代表整个数据库的medoid风格来完成说话人归一化。我们通过使用隐马尔可夫模型变分自编码器进行声学单元发现来证明该方法的有效性,但是,所提出的说话人归一化可以用作任何单元发现系统的前端。与非标准化输入相比,使用英语,约鲁巴语和姆博希语的实验显示出了改进。