当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Localization Based Sequential Grouping for Continuous Speech Separation
arXiv - CS - Sound Pub Date : 2021-07-14 , DOI: arxiv-2107.06853 Zhong-Qiu Wang, DeLiang Wang
arXiv - CS - Sound Pub Date : 2021-07-14 , DOI: arxiv-2107.06853 Zhong-Qiu Wang, DeLiang Wang
This study investigates robust speaker localization for con-tinuous speech
separation and speaker diarization, where we use speaker directions to group
non-contiguous segments of the same speaker. Assuming that speakers do not move
and are located in different directions, the direction of arrival (DOA)
information provides an informative cue for accurate sequential grouping and
speaker diarization. Our system is block-online in the following sense. Given a
block of frames with at most two speakers, we apply a two-speaker separa-tion
model to separate (and enhance) the speakers, estimate the DOA of each
separated speaker, and group the separation results across blocks based on the
DOA estimates. Speaker diarization and speaker-attributed speech recognition
results on the LibriCSS corpus demonstrate the effectiveness of the proposed
algorithm.
中文翻译:
用于连续语音分离的基于定位的顺序分组
本研究调查了针对连续语音分离和说话人分类的稳健说话人定位,其中我们使用说话人方向对同一说话人的非连续片段进行分组。假设说话人不移动并且位于不同的方向,到达方向 (DOA) 信息为准确的顺序分组和说话人分类提供了信息提示。我们的系统在以下意义上是块在线的。给定最多有两个说话者的帧块,我们应用两个说话者分离模型来分离(和增强)说话者,估计每个分离说话者的 DOA,并根据 DOA 估计将分离结果分组. LibriCSS 语料库上的说话人分类和说话人归因的语音识别结果证明了该算法的有效性。
更新日期:2021-07-15
中文翻译:
用于连续语音分离的基于定位的顺序分组
本研究调查了针对连续语音分离和说话人分类的稳健说话人定位,其中我们使用说话人方向对同一说话人的非连续片段进行分组。假设说话人不移动并且位于不同的方向,到达方向 (DOA) 信息为准确的顺序分组和说话人分类提供了信息提示。我们的系统在以下意义上是块在线的。给定最多有两个说话者的帧块,我们应用两个说话者分离模型来分离(和增强)说话者,估计每个分离说话者的 DOA,并根据 DOA 估计将分离结果分组. LibriCSS 语料库上的说话人分类和说话人归因的语音识别结果证明了该算法的有效性。