当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Xi-Vector Embedding for Speaker Recognition
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2021-06-23 , DOI: 10.1109/lsp.2021.3091932
Kong Aik Lee , Qiongqiong Wang , Takafumi Koshinaka

We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate. On the technology front, we offer a simple and straightforward extension to the now widely used x-vector. It consists of an auxiliary neural net predicting the frame-wise uncertainty of the input sequence. We show that the proposed extension leads to substantial improvement across all operating points, with a significant reduction in error rates and detection cost. On the theoretical front, our proposal integrates the Bayesian formulation of linear Gaussian model to speaker-embedding neural networks via the pooling layer. In one sense, our proposal integrates the Bayesian formulation of the i-vector to that of the x-vector. Hence, we refer to the embedding as the xi-vector, which is pronounced as /zai/ vector. Experimental results on the SITW evaluation set show a consistent improvement of over 17.5% in equal-error-rate and 10.9% in minimum detection cost.

中文翻译:

用于说话人识别的 Xi 向量嵌入

我们提出了一种用于深度说话人嵌入的贝叶斯公式,其中 xi 向量是 x 向量的贝叶斯对应物,同时考虑了不确定性估计。在技​​术方面,我们为现在广泛使用的 x-vector 提供了一个简单直接的扩展。它由一个辅助神经网络组成,用于预测输入序列的逐帧不确定性。我们表明,提议的扩展导致所有操作点的显着改进,显着降低了错误率和检测成本。在理论方面,我们的提议通过池化层将线性高斯模型的贝叶斯公式集成到说话人嵌入的神经网络中。在某种意义上,我们的提议将 i 向量的贝叶斯公式与 x 向量的贝叶斯公式相结合。因此,我们将嵌入称为 xi 向量,读作 /zai/ 向量。在 SITW 评估集上的实验结果表明,等错误率和最小检测成本持续提高了 17.5% 以上和 10.9%。
更新日期:2021-07-20
down
wechat
bug