当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised binaural source separation using auditory attention detection in realistic scenarios
Applied Acoustics ( IF 3.4 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.apacoust.2020.107826
Sahar Zakeri , Masoud Geravanchizadeh

Abstract Speech separation in crowded environments is challenging for hearing-impaired listeners. Unfortunately, hearing aids cannot segregate the main speaker from other interferences without knowing to which speaker the listener attends. In this paper, a new robust binaural speech separation system based on a supervised deep neural network (DNN) is introduced to separate attended speaker in different simulated room conditions (i.e., noisy and reverberant room with different azimuthal locations of the target speaker). For each speaker in the mixture of sound sources, a set of DNNs yields estimated ratio masks where one of them is selected by the auditory attention detection (AAD) procedure to separate the attended speech. The detection of the attended speaker among others is achieved by a Bi-directional long short-term memory (Bi-LSTM) network using phase-locking values (PLVs) extracted from the electroencephalography (EEG) signals. Two experiments are conducted to assess the performance of the proposed AAD-based binaural separation system vs. a state-of-the-art monaural segregation system from the literature, used as a baseline. The results of evaluating the AAD processing unit show high detection accuracy as compared with the detection method used in the baseline system. Also, systematic evaluations of the proposed binaural separation system against the baselines confirm its superiority in terms of various objective intelligibility and quality measures. The new AAD-based binaural separation system is advantageous in several aspects. The AAD method determines the direction of attention from single-trial EEG signals without access to audio signals from the speakers. The model presented in this research can be considered as an important processing tool in the structure of neuro-steered hearing aid devices employed in cocktail party scenarios.

中文翻译:

在现实场景中使用听觉注意检测监督双耳源分离

摘要 拥挤环境中的语音分离对于听障听众来说是一项挑战。不幸的是,助听器无法在不知道听众听的是哪个扬声器的情况下将主扬声器与其他干扰分开。在本文中,引入了一种新的基于监督深度神经网络 (DNN) 的鲁棒双耳语音分离系统,以分离不同模拟房间条件下(即目标说话人方位角位置不同的嘈杂和混响房间)中的值班说话人。对于混合声源中的每个说话者,一组 DNN 会产生估计的比率掩码,其中一个由听觉注意检测 (AAD) 程序选择以分离出席的语音。通过使用从脑电图 (EEG) 信号中提取的锁相值 (PLV) 的双向长短期记忆 (Bi-LSTM) 网络来实现对出席说话者的检测。进行了两个实验来评估所提出的基于 AAD 的双耳分离系统与文献中最先进的单耳分离系统的性能,用作基线。与基线系统中使用的检测方法相比,AAD 处理单元的评估结果显示出较高的检测精度。此外,针对基线对拟议的双耳分离系统进行的系统评估证实了其在各种客观可懂度和质量措施方面的优越性。新的基于 AAD 的双耳分离系统在几个方面具有优势。AAD 方法从单次试验 EEG 信号确定注意力方向,而无需访问来自扬声器的音频信号。本研究中提出的模型可以被视为鸡尾酒会场景中使用的神经导向助听器结构的重要处理工具。
更新日期:2021-04-01
down
wechat
bug