Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation
arXiv - CS - Sound Pub Date : 2020-07-06 , DOI: arxiv-2007.02683
Pyry Pyykk\"onen and Styliannos I. Mimilakis and Konstantinos Drossos and Tuomas Virtanen

Recent approaches for music source separation are almost exclusively based on deep neural networks, mostly employing recurrent neural networks (RNNs). Although RNNs are in many cases superior than other types of deep neural networks for sequence processing, they are known to have specific difficulties in training and parallelization, especially for the typically long sequences encountered in music source separation. In this paper we present a use-case of replacing RNNs with depth-wise separable (DWS) convolutions, which are a lightweight and faster variant of the typical convolutions. We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs). We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance, by utilizing the standard metrics of signal-to-artifacts, signal-to-interference, and signal-to-distortion ratio. Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.

中文翻译：

用于单耳歌唱语音分离的深度可分离卷积与循环神经网络

最近的音乐源分离方法几乎完全基于深度神经网络，主要采用循环神经网络 (RNN)。尽管 RNN 在许多情况下在序列处理方面优于其他类型的深度神经网络，但众所周知，它们在训练和并行化方面存在特定困难，尤其是对于音乐源分离中遇到的典型长序列。在本文中，我们提出了一个使用深度可分离 (DWS) 卷积替换 RNN 的用例，这是典型卷积的轻量级和更快的变体。我们专注于歌声分离，采用 RNN 架构，并用 DWS 卷积 (DWS-CNN) 替换 RNN。我们进行了消融研究并检查了 DWS-CNN 的通道数和层数对源分离性能的影响，通过使用信号与伪像、信号与干扰和信号与失真比的标准度量。我们的结果表明，通过用 DWS-CNN 替换 RNN，分别产生了 1.20、0.06、0.37 dB 的改进，同时仅使用了 RNN 架构参数量的 20.57%。

更新日期：2020-07-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文