Audio source separation by activity probability detection with maximum correlation and simplex geometry,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Audio source separation by activity probability detection with maximum correlation and simplex geometry
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-01-28 , DOI: 10.1186/s13636-021-00195-7
Bracha Laufer-Goldshtein , Ronen Talmon , Sharon Gannot

Two novel methods for speaker separation of multi-microphone recordings that can also detect speakers with infrequent activity are presented. The proposed methods are based on a statistical model of the probability of activity of the speakers across time. Each method takes a different approach for estimating the activity probabilities. The first method is derived using a linear programming (LP) problem for maximizing the correlation function between different time frames. It is shown that the obtained maxima correspond to frames which contain a single active speaker. Accordingly, we propose an algorithm for successive identification of frames dominated by each speaker. The second method aggregates the correlation values associated with each frame in a correlation vector. We show that these correlation vectors lie in a simplex with vertices that correspond to frames dominated by one of the speakers. In this method, we utilize convex geometry tools to sequentially detect the simplex vertices. The correlation functions associated with single-speaker frames, which are detected by either of the two proposed methods, are used for recovering the activity probabilities. A spatial mask is estimated based on the recovered probabilities and is utilized for separation and enhancement by means of both spatial and spectral processing. Experimental results demonstrate the performance of the proposed methods in various conditions on real-life recordings with different reverberation and noise levels, outperforming a state-of-the-art separation method.

中文翻译：

通过具有最大相关性和单纯形几何的活动概率检测来分离音频源

提出了两种用于分离多麦克风录音的扬声器的新颖方法，该方法还可以检测不经常发生的扬声器。所提出的方法基于说话者活动的概率随时间变化的统计模型。每种方法都采用不同的方法来估计活动概率。第一种方法是使用线性规划（LP）问题导出的，用于最大化不同时间帧之间的相关函数。示出了所获得的最大值对应于包含单个有源扬声器的帧。因此，我们提出了一种用于连续识别每个说话者占主导地位的帧的算法。第二种方法将与每个帧相关的相关值聚合到相关矢量中。我们表明，这些相关向量位于具有顶点的单纯形中，该顶点对应于由说话者之一控制的帧。在这种方法中，我们利用凸几何工具顺序检测单纯形顶点。通过两种提议的方法之一检测与单扬声器帧相关的相关函数，以恢复活动概率。基于恢复的概率估计空间掩码，并通过空间和频谱处理将其用于分离和增强。实验结果表明，所提出的方法在各种条件下在具有不同混响和噪声水平的真实录音中的性能均优于最新的分离方法。我们利用凸几何工具顺序检测单纯形顶点。通过两种提议的方法之一检测与单扬声器帧相关的相关函数，以恢复活动概率。基于恢复的概率估计空间掩码，并通过空间和频谱处理将其用于分离和增强。实验结果表明，所提出的方法在各种条件下在具有不同混响和噪声水平的真实录音中的性能均优于最新的分离方法。我们利用凸几何工具顺序检测单纯形顶点。通过两种提议的方法之一检测与单扬声器帧相关的相关函数，以恢复活动概率。基于恢复的概率估计空间掩码，并通过空间和频谱处理将其用于分离和增强。实验结果表明，所提出的方法在各种条件下在具有不同混响和噪声水平的真实录音中的性能均优于最新的分离方法。用于恢复活动概率。基于恢复的概率估计空间掩码，并通过空间和频谱处理将其用于分离和增强。实验结果表明，所提出的方法在各种条件下在具有不同混响和噪声水平的真实录音中的性能均优于最新的分离方法。用于恢复活动概率。基于恢复的概率估计空间掩码，并通过空间和频谱处理将其用于分离和增强。实验结果表明，所提出的方法在各种条件下在具有不同混响和噪声水平的真实录音中的性能均优于最新的分离方法。

更新日期：2021-01-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文