当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Real-time Speaker Diarization System Based on Spatial Spectrum
arXiv - CS - Sound Pub Date : 2021-07-20 , DOI: arxiv-2107.09321
Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng, Zhijie Yan

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third, an instant speaker number detector is developed to trigger the mechanism that separates overlapped speech. The results suggest that our system effectively incorporates spatial information and achieves significant gains.

中文翻译:

一种基于空间谱的实时说话人分类系统

在本文中,我们描述了一个发言者分类系统,该系统能够定位和识别对话或会议中出现的所有发言者。我们提出了一种新的系统方法来解决说话人分类任务中的几个长期挑战:(1)从两个说话人中分割和分离重叠的语音;(2) 估计参与者可以随时进入或离开对话的发言者数量;(3) 对与文本无关的短话语提供准确的说话人识别;(4) 跟踪谈话过程中说话人的移动;(5)实时检测说话人变化发生率。首先,利用基于差分定向麦克风阵列的方法在远场不利环境中捕获目标说话者的声音。第二,提出了一种在线说话人位置联合聚类方法来跟踪说话人位置。第三,开发了一种即时说话人数量检测器来触发分离重叠语音的机制。结果表明,我们的系统有效地整合了空间信息并取得了显着的收益。
更新日期:2021-07-21
down
wechat
bug