当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2021-03-03 , DOI: 10.1186/s13636-021-00201-y
Sushmita Thakallapalli , Suryakanth V. Gangashetty , Nilesh Madhu

Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.

中文翻译:

用于多扬声器到达方向估计的NMF加权SRP:在原子时域中利用稀疏性时对空间混叠的鲁棒性

使用麦克风阵列定位多个扬声器仍然是一个难题,特别是在存在噪声和混响的情况下。最先进的定位算法通常为此目的以某种表示形式利用语音的稀疏性。宽带方法利用时域稀疏性进行多扬声器定位,而窄带方法还可以利用时频表示中的稀疏性和不相交性。宽带方法对空间混叠具有鲁棒性,但不能最佳地利用频域稀疏性,从而导致麦克风间距离短的阵列的定位性能较差。另一方面,窄带方法很容易受到空间混叠的影响,因此不适用于麦克风间距较大的阵列。本文提出的一种方法是将信号频谱分解为宽带频谱分量(原子)的加权总和,然后利用时间原子表示中的信号稀疏性来同时进行多源定位。使用短期振幅谱的非负矩阵分解(NMF)原位分解为原子,并通过宽带转向响应功率(SRP)方法获得时间范围内每个活动原子的局部估计值。因此,这种SRP-NMF方法结合了窄带方法和宽带方法的优点,并且在多种麦克风间距之间的多扬声器定位任务中表现出色。在对来自诸如SiSEC和LOCATA之类的公共挑战的现实世界数据以及从记录的房间冲激响应生成的数据进行的测试中,
更新日期:2021-03-04
down
wechat
bug