Optimizing Tandem Speaker Verification and Anti-Spoofing Systems,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2021-12-28 , DOI: 10.1109/taslp.2021.3138681
Anssi Kanervisto , Ville Hautamaki , Tomi Kinnunen , Junichi Yamagishi

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker’s identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset in a constrained setting.

中文翻译：

优化串联扬声器验证和反欺骗系统

由于自动说话人验证 (ASV) 系统容易受到欺骗攻击，因此它们通常与欺骗对策 (CM) 系统结合使用以提高安全性。例如，CM可以首先判断输入是否是人类语音，然后ASV可以判断该语音是否与说话者的身份匹配。这种串联系统的性能可以通过串联检测成本函数（t-DCF）来测量。然而，ASV 和 CM 系统通常使用不同的指标和数据分别进行训练，这并不能优化它们的组合性能。在这项工作中，我们建议通过创建 t-DCF 的可微分版本并采用强化学习技术来直接优化串联系统。结果表明，这些方法提供了比微调更好的结果，我们的方法在受限设置下将 ASVSpoof19 数据集中的 t-DCF 相对提高了 20%。

更新日期：2021-12-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文