Localization Based Sequential Grouping for Continuous Speech Separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Localization Based Sequential Grouping for Continuous Speech Separation
arXiv - CS - Sound Pub Date : 2021-07-14 , DOI: arxiv-2107.06853
Zhong-Qiu Wang, DeLiang Wang

This study investigates robust speaker localization for con-tinuous speech separation and speaker diarization, where we use speaker directions to group non-contiguous segments of the same speaker. Assuming that speakers do not move and are located in different directions, the direction of arrival (DOA) information provides an informative cue for accurate sequential grouping and speaker diarization. Our system is block-online in the following sense. Given a block of frames with at most two speakers, we apply a two-speaker separa-tion model to separate (and enhance) the speakers, estimate the DOA of each separated speaker, and group the separation results across blocks based on the DOA estimates. Speaker diarization and speaker-attributed speech recognition results on the LibriCSS corpus demonstrate the effectiveness of the proposed algorithm.

中文翻译：

用于连续语音分离的基于定位的顺序分组

本研究调查了针对连续语音分离和说话人分类的稳健说话人定位，其中我们使用说话人方向对同一说话人的非连续片段进行分组。假设说话人不移动并且位于不同的方向，到达方向 (DOA) 信息为准确的顺序分组和说话人分类提供了信息提示。我们的系统在以下意义上是块在线的。给定最多有两个说话者的帧块，我们应用两个说话者分离模型来分离（和增强）说话者，估计每个分离说话者的 DOA，并根据 DOA 估计将分离结果分组. LibriCSS 语料库上的说话人分类和说话人归因的语音识别结果证明了该算法的有效性。

更新日期：2021-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>