Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias
arXiv - CS - Sound Pub Date : 2020-09-21 , DOI: arxiv-2009.09556
Mufan Sang, Wei Xia, John H.L. Hansen

In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. In order to advance the trained deep speaker embedding network to be robust for a small target dataset, we introduce a novel strategy to fine-tune the pre-trained student model towards a forensic target domain by utilizing the model as a finetuning start point and a reference in regularization. The proposed approaches are evaluated on the 1st48-UTD forensic corpus, a newly established naturalistic dataset of actual homicide investigations consisting of short utterances recorded in uncontrolled conditions. We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances and that our fine-tuning strategy outperforms the commonly used weight decay method by providing an explicit inductive bias towards the pre-trained model.

中文翻译：

使用具有显式归纳偏差的师生网络的开放式短话语法医说话人验证

在法医应用中，通常只有由复杂或未知声学环境中的短话语组成的小型自然数据集可用。在这项研究中，我们提出了一种管道解决方案，以改进对小型实际取证领域数据集的说话人验证。通过利用大规模域外数据集，提出了一种基于知识蒸馏的目标函数，用于师生学习，用于短话语取证说话人验证。目标函数综合考虑说话人分类损失、Kullback-Leibler 散度和嵌入的相似性。为了使经过训练的深度说话人嵌入网络对小目标数据集具有鲁棒性，我们引入了一种新颖的策略，通过利用模型作为微调起点和正则化参考，将预训练的学生模型微调到法医目标域。建议的方法在 1st48-UTD 法医语料库上进行评估，这是一个新建立的实际凶杀案调查的自然数据集，由在不受控制的条件下记录的简短话语组成。我们表明，所提出的目标函数可以有效地提高师生学习短话语的性能，并且我们的微调策略通过对预训练模型提供明确的归纳偏差，优于常用的权重衰减方法。一个新建立的实际凶杀案调查自然数据集，由在不受控制的条件下记录的简短话语组成。我们表明，所提出的目标函数可以有效地提高师生学习短话语的性能，并且我们的微调策略通过对预训练模型提供明确的归纳偏差，优于常用的权重衰减方法。一个新建立的实际凶杀案调查自然数据集，由在不受控制的条件下记录的简短话语组成。我们表明，所提出的目标函数可以有效地提高师生学习短话语的性能，并且我们的微调策略通过对预训练模型提供明确的归纳偏差，优于常用的权重衰减方法。

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文