Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
arXiv - CS - Sound Pub Date : 2021-01-13 , DOI: arxiv-2101.05014
Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation. Nevertheless, conventional TasNets struggle to satisfy the memory and latency constraints in industrial applications. In this regard, we design a low-cost high-performance architecture, namely, globally attentive locally recurrent (GALR) network. Alike the dual-path RNN (DPRNN), we first split a feature sequence into 2D segments and then process the sequence along both the intra- and inter-segment dimensions. Our main innovation lies in that, on top of features recurrently processed along the inter-segment dimensions, GALR applies a self-attention mechanism to the sequence along the inter-segment dimension, which aggregates context-aware information and also enables parallelization. Our experiments suggest that GALR is a notably more effective network than the prior work. On one hand, with only 1.5M parameters, it has achieved comparable separation performance at a much lower cost with 36.1% less runtime memory and 49.4% fewer computational operations, relative to the DPRNN. On the other hand, in a comparable model size with DPRNN, GALR has consistently outperformed DPRNN in three datasets, in particular, with a substantial margin of 2.4dB absolute improvement of SI-SNRi in the benchmark WSJ0-2mix task.

中文翻译：

使用全球关注的本地循环网络进行有效的低成本时域音频分离

时域音频分离网络（TasNets）的最新研究为语音分离带来了巨大的成功。然而，传统的TasNet努力满足工业应用中的内存和延迟限制。在这方面，我们设计了一种低成本的高性能体系结构，即全球关注的本地循环（GALR）网络。与双路径RNN（DPRNN）相似，我们首先将特征序列分割为2D片段，然后沿着片段内和片段间维进行处理。我们的主要创新之处在于，除了沿段间维度重复处理的功能外，GALR还沿段间维度对序列应用了一种自我注意机制，该机制可以聚合上下文感知的信息并实现并行化。我们的实验表明，GALR比以前的工作明显更有效。一方面，与DPRNN相比，仅使用1.5M参数，它就以低得多的成本实现了相当的分离性能，运行时内存减少了36.1％，计算操作减少了49.4％。另一方面，在与DPRNN具有可比性的模型尺寸中，GALR在三个数据集中的性能始终优于DPRNN，尤其是在基准WSJ0-2mix任务中，SI-SNRi的绝对改善幅度为2.4dB。

更新日期：2021-01-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文