当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Tandem Framework Balancing Privacy and Security for Voice User Interfaces
arXiv - CS - Sound Pub Date : 2021-07-21 , DOI: arxiv-2107.10045
Ranya Aloufi, Hamed Haddadi, David Boyle

Speech synthesis, voice cloning, and voice conversion techniques present severe privacy and security threats to users of voice user interfaces (VUIs). These techniques transform one or more elements of a speech signal, e.g., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a spoofing attack using fraudulent biometrics for a legitimate speaker. Conversely, such techniques have been used to generate privacy-transformed speech by suppressing personally identifiable attributes in the voice signals, achieving anonymization. Prior works have studied the security and privacy vectors in parallel, and thus it raises alarm that if a benign user can achieve privacy by a transformation, it also means that a malicious user can break security by bypassing the anti-spoofing mechanism. In this paper, we take a step towards balancing two seemingly conflicting requirements: security and privacy. It remains unclear what the vulnerabilities in one domain imply for the other, and what dynamic interactions exist between them. A better understanding of these aspects is crucial for assessing and mitigating vulnerabilities inherent with VUIs and building effective defenses. In this paper,(i) we investigate the applicability of the current voice anonymization methods by deploying a tandem framework that jointly combines anti-spoofing and authentication models, and evaluate the performance of these methods;(ii) examining analytical and empirical evidence, we reveal a duality between the two mechanisms as they offer different ways to achieve the same objective, and we show that leveraging one vector significantly amplifies the effectiveness of the other;(iii) we demonstrate that to effectively defend from potential attacks against VUIs, it is necessary to investigate the attacks from multiple complementary perspectives(security and privacy).

中文翻译:

平衡语音用户界面隐私和安全性的串联框架

语音合成、语音克隆和语音转换技术给语音用户界面 (VUI) 的用户带来了严重的隐私和安全威胁。这些技术转换语音信号的一个或多个元素,例如身份和情感,同时保留语言信息。攻击者可能会使用高级转换工具对合法说话者使用欺诈性生物识别技术来触发欺骗攻击。相反,此类技术已被用于通过抑制语音信号中的个人可识别属性来生成隐私转换语音,从而实现匿名化。之前的工作并行研究了安全和隐私向量,因此提出警告,如果良性用户可以通过转换实现隐私,这也意味着恶意用户可以通过绕过反欺骗机制来破坏安全。在本文中,我们朝着平衡两个看似相互冲突的要求迈出了一步:安全性和隐私性。目前尚不清楚一个域中的漏洞对另一个域意味着什么,以及它们之间存在哪些动态交互。更好地了解这些方面对于评估和缓解 VUI 固有的漏洞以及构建有效防御至关重要。在本文中,(i) 我们通过部署一个联合框架将反欺骗和身份验证模型联合起来,研究当前语音匿名化方法的适用性,并评估这些方法的性能;(ii) 检查分析和经验证据,我们揭示两种机制之间的二元性,因为它们提供了实现同一目标的不同方式,
更新日期:2021-07-22
down
wechat
bug