Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances
arXiv - CS - Sound Pub Date : 2020-07-06 , DOI: arxiv-2007.02780
Stylianos Ioannis Mimilakis, Konstantinos Drossos, Gerald Schuller

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation. We build upon a previously proposed method for learning representations of time-domain music signals with a re-parameterized denoising autoencoder, extending it by using the family of Sinkhorn distances with entropic regularization. We evaluate our method on the freely available MUSDB18 dataset of professionally produced music recordings, and our results show that Sinkhorn distances with small strength of entropic regularization are marginally improving the performance of informed singing voice separation. By increasing the strength of the entropic regularization, the learned representations of the mixture signal consists of almost perfectly additive and distinctly structured sources.

中文翻译：

重温表示学习以使用 Sinkhorn 距离进行歌唱语音分离

在这项工作中，我们提出了一种无监督学习音频表示的方法，专注于歌声分离的任务。我们建立在先前提出的使用重新参数化去噪自动编码器学习时域音乐信号表示的方法的基础上，通过使用具有熵正则化的 Sinkhorn 距离系列对其进行扩展。我们在免费提供的专业制作音乐录音的 MUDB18 数据集上评估我们的方法，我们的结果表明，具有小熵正则化强度的 Sinkhorn 距离略微提高了知情歌声分离的性能。通过增加熵正则化的强度，混合信号的学习表示由几乎完美的加法和结构清晰的源组成。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文