当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
arXiv - CS - Sound Pub Date : 2021-01-13 , DOI: arxiv-2101.05014 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
arXiv - CS - Sound Pub Date : 2021-01-13 , DOI: arxiv-2101.05014 Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Recent research on the time-domain audio separation networks (TasNets) has
brought great success to speech separation. Nevertheless, conventional TasNets
struggle to satisfy the memory and latency constraints in industrial
applications. In this regard, we design a low-cost high-performance
architecture, namely, globally attentive locally recurrent (GALR) network.
Alike the dual-path RNN (DPRNN), we first split a feature sequence into 2D
segments and then process the sequence along both the intra- and inter-segment
dimensions. Our main innovation lies in that, on top of features recurrently
processed along the inter-segment dimensions, GALR applies a self-attention
mechanism to the sequence along the inter-segment dimension, which aggregates
context-aware information and also enables parallelization. Our experiments
suggest that GALR is a notably more effective network than the prior work. On
one hand, with only 1.5M parameters, it has achieved comparable separation
performance at a much lower cost with 36.1% less runtime memory and 49.4% fewer
computational operations, relative to the DPRNN. On the other hand, in a
comparable model size with DPRNN, GALR has consistently outperformed DPRNN in
three datasets, in particular, with a substantial margin of 2.4dB absolute
improvement of SI-SNRi in the benchmark WSJ0-2mix task.
中文翻译:
使用全球关注的本地循环网络进行有效的低成本时域音频分离
时域音频分离网络(TasNets)的最新研究为语音分离带来了巨大的成功。然而,传统的TasNet努力满足工业应用中的内存和延迟限制。在这方面,我们设计了一种低成本的高性能体系结构,即全球关注的本地循环(GALR)网络。与双路径RNN(DPRNN)相似,我们首先将特征序列分割为2D片段,然后沿着片段内和片段间维进行处理。我们的主要创新之处在于,除了沿段间维度重复处理的功能外,GALR还沿段间维度对序列应用了一种自我注意机制,该机制可以聚合上下文感知的信息并实现并行化。我们的实验表明,GALR比以前的工作明显更有效。一方面,与DPRNN相比,仅使用1.5M参数,它就以低得多的成本实现了相当的分离性能,运行时内存减少了36.1%,计算操作减少了49.4%。另一方面,在与DPRNN具有可比性的模型尺寸中,GALR在三个数据集中的性能始终优于DPRNN,尤其是在基准WSJ0-2mix任务中,SI-SNRi的绝对改善幅度为2.4dB。
更新日期:2021-01-14
中文翻译:
使用全球关注的本地循环网络进行有效的低成本时域音频分离
时域音频分离网络(TasNets)的最新研究为语音分离带来了巨大的成功。然而,传统的TasNet努力满足工业应用中的内存和延迟限制。在这方面,我们设计了一种低成本的高性能体系结构,即全球关注的本地循环(GALR)网络。与双路径RNN(DPRNN)相似,我们首先将特征序列分割为2D片段,然后沿着片段内和片段间维进行处理。我们的主要创新之处在于,除了沿段间维度重复处理的功能外,GALR还沿段间维度对序列应用了一种自我注意机制,该机制可以聚合上下文感知的信息并实现并行化。我们的实验表明,GALR比以前的工作明显更有效。一方面,与DPRNN相比,仅使用1.5M参数,它就以低得多的成本实现了相当的分离性能,运行时内存减少了36.1%,计算操作减少了49.4%。另一方面,在与DPRNN具有可比性的模型尺寸中,GALR在三个数据集中的性能始终优于DPRNN,尤其是在基准WSJ0-2mix任务中,SI-SNRi的绝对改善幅度为2.4dB。