Creation and Detection of German Voice Deepfakes,arXiv - CS - Cryptography and Security

当前位置： X-MOL 学术 › arXiv.cs.CR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Creation and Detection of German Voice Deepfakes
arXiv - CS - Cryptography and Security Pub Date : 2021-08-02 , DOI: arxiv-2108.01469
Vanessa Barnekow, Dominik Binder, Niclas Kromrey, Pascal Munaretto, Andreas Schaad, Felix Schmieder

Synthesizing voice with the help of machine learning techniques has made rapid progress over the last years [1] and first high profile fraud cases have been recently reported [2]. Given the current increase in using conferencing tools for online teaching, we question just how easy (i.e. needed data, hardware, skill set) it would be to create a convincing voice fake. We analyse how much training data a participant (e.g. a student) would actually need to fake another participants voice (e.g. a professor). We provide an analysis of the existing state of the art in creating voice deep fakes, as well as offer detailed technical guidance and evidence of just how much effort is needed to copy a voice. A user study with more than 100 participants shows how difficult it is to identify real and fake voice (on avg. only 37 percent can distinguish between real and fake voice of a professor). With a focus on German language and an online teaching environment we discuss the societal implications as well as demonstrate how to use machine learning techniques to possibly detect such fakes.

中文翻译：

德语语音 Deepfakes 的创建和检测

在机器学习技术的帮助下合成语音在过去几年取得了快速进展 [1] 并且最近报道了首个引人注目的欺诈案件 [2]。鉴于目前在线教学中使用会议工具的增加，我们质疑创建一个令人信服的假声有多容易（即所需的数据、硬件、技能）。我们分析了参与者（例如学生）实际上需要多少训练数据来伪造另一个参与者的声音（例如教授）。我们分析了创建语音深度伪造的现有技术，并提供详细的技术指导和证据，证明复制语音需要付出多少努力。一项涉及 100 多名参与者的用户研究表明，识别真假语音是多么困难（平均只有 37% 的人能区分教授的真假声音）。我们重点关注德语和在线教学环境，讨论其社会影响，并演示如何使用机器学习技术来检测此类假货。

更新日期：2021-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文