当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Model Compression for Deep Learning Based Speech Enhancement
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2021-05-20 , DOI: 10.1109/taslp.2021.3082282
Ke Tan 1 , DeLiang Wang 2
Affiliation  

The use of deep neural networks (DNNs) has dramatically elevated the performance of speech enhancement over the last decade. However, to achieve strong enhancement performance typically requires a large DNN, which is both memory and computation consuming, making it difficult to deploy such speech enhancement systems on devices with limited hardware resources or in applications with strict latency requirements. In this study, we propose two compression pipelines to reduce the model size for DNN-based speech enhancement, which incorporates three different techniques: sparse regularization, iterative pruning and clustering-based quantization. We systematically investigate these techniques and evaluate the proposed compression pipelines. Experimental results demonstrate that our approach reduces the sizes of four different models by large margins without significantly sacrificing their enhancement performance. In addition, we find that the proposed approach performs well on speaker separation, which further demonstrates the effectiveness of the approach for compressing speech separation models.

中文翻译:


面向基于深度学习的语音增强的模型压缩



在过去的十年中,深度神经网络 (DNN) 的使用极大地提高了语音增强的性能。然而,要实现强大的增强性能通常需要大型 DNN,这既消耗内存又消耗计算量,因此很难在硬件资源有限的设备或延迟要求严格的应用中部署此类语音增强系统。在本研究中,我们提出了两种压缩管道来减少基于 DNN 的语音增强的模型大小,其中结合了三种不同的技术:稀疏正则化、迭代剪枝和基于聚类的量化。我们系统地研究这些技术并评估所提出的压缩管道。实验结果表明,我们的方法大幅减小了四种不同模型的尺寸,而没有显着牺牲它们的增强性能。此外,我们发现所提出的方法在说话人分离方面表现良好,这进一步证明了该方法在压缩语音分离模型方面的有效性。
更新日期:2021-05-20
down
wechat
bug