当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
arXiv - CS - Sound Pub Date : 2020-07-02 , DOI: arxiv-2007.00991
Eugene Kharitonov and Morgane Rivi\`ere and Gabriel Synnaeve and Lior Wolf and Pierre-Emmanuel Mazar\'e and Matthijs Douze and Emmanuel Dupoux

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

中文翻译:

时域中语音表示的数据增强对比学习

对比预测编码 (CPC),基于过去片段预测未来的语音片段,正在成为一种强大的语音信号表征学习算法。然而,它在无监督评估基准上的表现仍然不如其他方法。在这里,我们介绍了 WavAugment,一个时域数据增强库,并发现过去应用增强通常比其他方法更有效,产生更好的性能。我们发现音高修改、加性噪声和混响的组合大大提高了 CPC 的性能(相对提高了 18-22%),以少 600 倍的数据击败参考 Libri-light 结果。使用域外数据集,时域数据增强可以推动 CPC 与 2017 年零语音基准的最新技术水平相提并论。
更新日期:2020-07-03
down
wechat
bug