当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised adaptation of PLDA models for broadcast diarization
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13636-019-0167-7
Ignacio Viñals , Alfonso Ortega , Jesús Villalba , Antonio Miguel , Eduardo Lleida

We present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by means of supervised model adaptation approaches. By contrast, we propose an unsupervised adaptation method which does not need for in-domain labeled data but only the recording that we are diarizing. We rely on an inner adaptation block which combines Agglomerative Hierarchical Clustering (AHC) and Mean-Shift (MS) clustering techniques with a Fully Bayesian Probabilistic Linear Discriminant Analysis (PLDA) to produce pseudo-speaker labels suitable for model adaptation. We propose multiple adaptation approaches based on this basic block, including unsupervised and semi-supervised. Our proposed solutions, analyzed with the Multi-Genre Broadcast 2015 (MGB) dataset, reported significant improvements (16% relative improvement) with respect to the baseline, also outperforming a supervised adaptation proposal with low resources (9% relative improvement). Furthermore, our proposed unsupervised adaptation is totally compatible with a supervised one. The joint use of both adaptation techniques (supervised and unsupervised) shows a 13% relative improvement with respect to only considering the supervised adaptation.

中文翻译:

PLDA 模型的无监督适应广播分类

我们提出了一种新的模型适应方法来处理广播环境中说话人分类的数据可变性。通过监督模型适应方法,可以使用昂贵的人工注释数据来减轻域不匹配。相比之下,我们提出了一种无监督的适应方法,它不需要域内标记数据,而只需要我们正在记录的记录。我们依靠内部自适应块,它将凝聚层次聚类 (AHC) 和均值漂移 (MS) 聚类技术与完全贝叶斯概率线性判别分析 (PLDA) 相结合,以生成适用于模型自适应的伪扬声器标签。我们基于这个基本块提出了多种适应方法,包括无监督和半监督。我们提出的解决方案,使用 Multi-Genre Broadcast 2015 (MGB) 数据集进行分析,报告了相对于基线的显着改进(16% 相对改进),也优于具有低资源的监督适应建议(9% 相对改进)。此外,我们提出的无监督适应与监督适应完全兼容。两种适应技术(监督和非监督)的联合使用显示出相对于仅考虑监督适应的 13% 的相对改进。
更新日期:2019-12-01
down
wechat
bug