当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In defence of metric learning for speaker recognition
arXiv - CS - Sound Pub Date : 2020-03-26 , DOI: arxiv-2003.11982
Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.

中文翻译:

为说话人识别的度量学习辩护

本文的目标是对未见过的说话者进行“开放式”说话人识别,其中理想的嵌入应该能够将信息浓缩为紧凑的话语级表示,该表示具有较小的内部说话者和大的说话者间距离。对说话人识别的普遍看法是,使用分类目标训练的网络优于度量学习方法。在本文中,我们对 VoxCeleb 数据集上最流行的说话人识别损失函数进行了广泛的评估。我们证明,与基于分类的损失相比,vanilla 三元组损失显示出有竞争力的性能,并且使用我们提出的度量学习目标训练的损失优于最先进的方法。
更新日期:2020-11-05
down
wechat
bug