当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems
arXiv - CS - Sound Pub Date : 2020-07-13 , DOI: arxiv-2007.06622
Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot, and Patrick Traynor

Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost universally fail to transfer. In so doing, we argue that substantial additional work is required to provide adequate mitigations in this space.

中文翻译:

SoK:我们 ASR 中的缺陷:针对自动语音识别和说话人识别系统的攻击概述

语音和说话人识别系统用于各种应用,从个人助理到电话监控和生物特征认证。神经网络精度的提高使这些系统的广泛部署成为可能。与其他基于神经网络的系统一样,最近的研究表明,语音和说话人识别系统很容易受到使用操纵输入的攻击。然而,正如我们在本文中所展示的,语音和扬声器系统的端到端架构及其输入的性质使得针对它们的攻击和防御与图像空间中的攻击和防御大不相同。我们首先通过系统化该领域的现有研究并提供分类法来证明这一点,社区可以通过该分类法评估未来的工作。然后我们通过实验证明对这些模型的攻击几乎普遍无法转移。在这样做时,我们认为需要大量的额外工作才能在这个领域提供足够的缓解措施。
更新日期:2020-07-22
down
wechat
bug