当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compute and memory efficient universal sound source separation
arXiv - CS - Sound Pub Date : 2021-03-03 , DOI: arxiv-2103.02644
Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-world scenarios. The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions. This mechanism enables our models to obtain high fidelity signal separation in a wide variety of settings where variable number of sources are present and with limited computational resources (e.g. floating point operations, memory footprint, number of parameters and latency). Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks with significantly higher computational resource requirements. The causal variation of SuDoRM-RF is able to obtain competitive performance in real-time speech separation of around 10dB scale-invariant signal-to-distortion ratio improvement (SI-SDRi) while remaining up to 20 times faster than real-time on a laptop device.

中文翻译:

计算和存储有效的通用声源分离

深度学习在音频源分离方面的最新进展使许多神经网络模型能够为该基本估计问题提供可靠的解决方案。在这项研究中,我们为通用音频源分离提供了一系列有效的神经网络体系结构,同时着重于阻碍神经网络在实际场景中应用的多个计算方面。该卷积网络的骨干结构是多分辨率特征的连续DOwnsampling和重采样(SuDoRM-RF)以及通过简单的一维卷积执行的聚合。这种机制使我们的模型能够在多种设置中获得高保真度的信号分离,其中存在可变数量的源并且计算资源有限(例如,浮点运算,内存占用量,参数数量和延迟)。我们的实验表明,SuDoRM-RF模型的性能相当,甚至超过了几个最新基准,并且对计算资源的要求更高。SuDoRM-RF的因果变化能够在实时语音分离中获得约10dB的音阶不变信号失真比改善(SI-SDRi),而在实时语音分离中仍比实时快20倍。笔记本电脑设备。
更新日期:2021-03-05
down
wechat
bug