当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks
arXiv - CS - Sound Pub Date : 2020-12-29 , DOI: arxiv-2012.14952
Federico Landini, Ján Profant, Mireia Diez, Lukáš Burget

The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three datasets. Besides, we point out the lack of a standardized evaluation protocol for AMI dataset and we propose a new protocol for both Beamformed and Mix-Headset audios based on the official AMI partitions and transcriptions.

中文翻译:

说话人歧义化中x向量序列(VBx)的贝叶斯HMM聚类:标准任务的理论,实现和分析

最近提出的VBx离散化方法使用贝叶斯隐马尔可夫模型来查找x向量序列中的说话者群集。在这项工作中,我们对VBx解析化的性能与文献中的其他方法进行了广泛的比较,并且我们证明了VBx在三个最常用的评估差异化数据集(CALLHOME,AMI和DIHARDII数据集)上均具有出色的性能。此外,我们首次展示了VBx模型的推导和更新公式,与以前的更复杂的BHMM模型(逐帧标准倒谱特性)相比,该模型的效率和简洁性更为突出。我们与该出版物一起发布了用于在宽带和窄带数据上训练实验中使用的x矢量提取器的方法,以及在所有三个数据集上均具有最先进性能的VBx配方。此外,我们指出了缺乏针对AMI数据集的标准化评估协议,并基于官方AMI分区和转录,针对Beamformed和Mix-Headset音频提出了新协议。
更新日期:2021-01-01
down
wechat
bug