当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation
arXiv - CS - Sound Pub Date : 2019-10-14 , DOI: arxiv-1910.06379
Yi Luo, Zhuo Chen, Takuya Yoshioka

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods. Unlike the time-frequency domain approaches, the time-domain separation systems often receive input sequences consisting of a huge number of time steps, which introduces challenges for modeling extremely long sequences. Conventional recurrent neural networks (RNNs) are not effective for modeling such long sequences due to optimization difficulties, while one-dimensional convolutional neural networks (1-D CNNs) cannot perform utterance-level sequence modeling when its receptive field is smaller than the sequence length. In this paper, we propose dual-path recurrent neural network (DPRNN), a simple yet effective method for organizing RNN layers in a deep structure to model extremely long sequences. DPRNN splits the long sequential input into smaller chunks and applies intra- and inter-chunk operations iteratively, where the input length can be made proportional to the square root of the original sequence length in each operation. Experiments show that by replacing 1-D CNN with DPRNN and apply sample-level modeling in the time-domain audio separation network (TasNet), a new state-of-the-art performance on WSJ0-2mix is achieved with a 20 times smaller model than the previous best system.

中文翻译:

双路径 RNN:时域单通道语音分离的高效长序列建模

最近对基于深度学习的语音分离的研究证明了时域方法比传统的基于时频的方法的优越性。与时频域方法不同,时域分离系统通常接收由大量时间步长组成的输入序列,这给建模极长序列带来了挑战。由于优化困难,传统的循环神经网络 (RNN) 无法有效地对如此长的序列进行建模,而一维卷积神经网络 (1-D CNN) 在其感受野小于序列长度时无法进行话语级别的序列建模. 在本文中,我们提出了双路径循环神经网络(DPRNN),一种在深层结构中组织 RNN 层以对极长序列进行建模的简单而有效的方法。DPRNN 将长序列输入拆分成更小的块,并迭代地应用块内和块间操作,其中输入长度可以与每个操作中原始序列长度的平方根成正比。实验表明,通过用 DPRNN 替换 1-D CNN 并在时域音频分离网络 (TasNet) 中应用样本级建模,WSJ0-2mix 实现了新的最先进的性能,而体积缩小了 20 倍模型比以前最好的系统。
更新日期:2020-03-30
down
wechat
bug