当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
arXiv - CS - Sound Pub Date : 2020-07-02 , DOI: arxiv-2007.00991 Eugene Kharitonov and Morgane Rivi\`ere and Gabriel Synnaeve and Lior Wolf and Pierre-Emmanuel Mazar\'e and Matthijs Douze and Emmanuel Dupoux
arXiv - CS - Sound Pub Date : 2020-07-02 , DOI: arxiv-2007.00991 Eugene Kharitonov and Morgane Rivi\`ere and Gabriel Synnaeve and Lior Wolf and Pierre-Emmanuel Mazar\'e and Matthijs Douze and Emmanuel Dupoux
Contrastive Predictive Coding (CPC), based on predicting future segments of
speech based on past segments is emerging as a powerful algorithm for
representation learning of speech signal. However, it still under-performs
other methods on unsupervised evaluation benchmarks. Here, we introduce
WavAugment, a time-domain data augmentation library and find that applying
augmentation in the past is generally more efficient and yields better
performances than other methods. We find that a combination of pitch
modification, additive noise and reverberation substantially increase the
performance of CPC (relative improvement of 18-22%), beating the reference
Libri-light results with 600 times less data. Using an out-of-domain dataset,
time-domain data augmentation can push CPC to be on par with the state of the
art on the Zero Speech Benchmark 2017. We also show that time-domain data
augmentation consistently improves downstream limited-supervision phoneme
classification tasks by a factor of 12-15% relative.
中文翻译:
时域中语音表示的数据增强对比学习
对比预测编码 (CPC),基于过去片段预测未来的语音片段,正在成为一种强大的语音信号表征学习算法。然而,它在无监督评估基准上的表现仍然不如其他方法。在这里,我们介绍了 WavAugment,一个时域数据增强库,并发现过去应用增强通常比其他方法更有效,产生更好的性能。我们发现音高修改、加性噪声和混响的组合大大提高了 CPC 的性能(相对提高了 18-22%),以少 600 倍的数据击败参考 Libri-light 结果。使用域外数据集,时域数据增强可以推动 CPC 与 2017 年零语音基准的最新技术水平相提并论。
更新日期:2020-07-03
中文翻译:
时域中语音表示的数据增强对比学习
对比预测编码 (CPC),基于过去片段预测未来的语音片段,正在成为一种强大的语音信号表征学习算法。然而,它在无监督评估基准上的表现仍然不如其他方法。在这里,我们介绍了 WavAugment,一个时域数据增强库,并发现过去应用增强通常比其他方法更有效,产生更好的性能。我们发现音高修改、加性噪声和混响的组合大大提高了 CPC 的性能(相对提高了 18-22%),以少 600 倍的数据击败参考 Libri-light 结果。使用域外数据集,时域数据增强可以推动 CPC 与 2017 年零语音基准的最新技术水平相提并论。