当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A simulation study on optimal scores for speaker recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2020-11-25 , DOI: 10.1186/s13636-020-00183-3
Dong Wang

In this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for the speaker identification (SI) task and the speaker verification (SV) task in the sense of minimum Bayes risk (MBR) and show that the optimal scores for the two tasks can be formulated as a single form of normalized likelihood (NL). We show that when the underlying model is linear Gaussian, the NL score is mathematically equivalent to the PLDA likelihood ratio (LR), and the empirical scores based on cosine distance and Euclidean distance can be seen as approximations of this linear Gaussian NL score under some conditions.Based on the unified NL score, we conducted a comprehensive simulation study to investigate the behavior of the scoring component on both the SI task and SV task, in the case where the distribution of the speaker vectors perfectly matches the assumption of the NL model, as well as the case where some mismatch is involved. Importantly, our simulation is based on the statistics of speaker vectors derived from a practical speaker recognition system, hence reflecting the behavior of the NL scoring in real-life scenarios that are full of imperfection, including non-Gaussianality, non-homogeneity, and domain/condition mismatch.

中文翻译:

说话人识别最优分数的模拟研究

在本文中,我们对基于说话人嵌入的说话人识别系统的最佳分数进行了全面的模拟研究。为此,我们首先在最小贝叶斯风险 (MBR) 的意义上重新审视说话人识别 (SI) 任务和说话人验证 (SV) 任务的最佳分数,并表明这两个任务的最佳分数可以表示为单一形式的归一化似然 (NL)。我们表明,当底层模型是线性高斯时,NL 分数在数学上等同于 PLDA 似然比 (LR),基于余弦距离和欧几里德距离的经验分数可以看作是在某些情况下该线性高斯 NL 分数的近似值。条件。基于统一的NL分数,我们进行了全面的模拟研究,以研究评分组件在 SI 任务和 SV 任务上的行为,在说话人向量的分布与 NL 模型的假设完全匹配的情况下,以及一些不匹配的情况下参与了。重要的是,我们的模拟基于来自实际说话人识别系统的说话人向量的统计数据,因此反映了现实生活中充满缺陷的 NL 评分行为,包括非高斯性、非同质性和域/条件不匹配。
更新日期:2020-11-25
down
wechat
bug