A multichannel learning-based approach for sound source separation in reverberant environments,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A multichannel learning-based approach for sound source separation in reverberant environments
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2021-11-20 , DOI: 10.1186/s13636-021-00227-2
You-Siang Chen , Zi-Jie Lin , Mingsian R. Bai

In this paper, a multichannel learning-based network is proposed for sound source separation in reverberant field. The network can be divided into two parts according to the training strategies. In the first stage, time-dilated convolutional blocks are trained to estimate the array weights for beamforming the multichannel microphone signals. Next, the output of the network is processed by a weight-and-sum operation that is reformulated to handle real-valued data in the frequency domain. In the second stage, a U-net model is concatenated to the beamforming network to serve as a non-linear mapping filter for joint separation and dereverberation. The scale invariant mean square error (SI-MSE) that is a frequency-domain modification from the scale invariant signal-to-noise ratio (SI-SNR) is used as the objective function for training. Furthermore, the combined network is also trained with the speech segments filtered by a great variety of room impulse responses. Simulations are conducted for comprehensive multisource scenarios of various subtending angles of sources and reverberation times. The proposed network is compared with several baseline approaches in terms of objective evaluation matrices. The results have demonstrated the excellent performance of the proposed network in dereverberation and separation, as compared to baseline methods.

中文翻译：

一种基于多通道学习的混响环境声源分离方法

在本文中，提出了一种基于多通道学习的网络，用于混响领域的声源分离。根据训练策略，网络可以分为两部分。在第一阶段，训练时间膨胀的卷积块来估计多通道麦克风信号波束形成的阵列权重。接下来，网络的输出通过权重求和运算进行处理，该运算被重新制定以处理频域中的实值数据。在第二阶段，U-net 模型连接到波束成形网络，作为非线性映射滤波器，用于联合分离和去混响。尺度不变均方误差 (SI-MSE) 是对尺度不变信噪比 (SI-SNR) 的频域修正，用作训练的目标函数。此外，组合网络还使用由各种房间脉冲响应过滤的语音段进行训练。针对各种源对角和混响时间的综合多源场景进行模拟。在客观评估矩阵方面，将提议的网络与几种基线方法进行比较。结果表明，与基线方法相比，所提出的网络在去混响和分离方面具有出色的性能。

更新日期：2021-11-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>