Human Perception of Audio Deepfakes,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human Perception of Audio Deepfakes
arXiv - CS - Sound Pub Date : 2021-07-20 , DOI: arxiv-2107.09667
Nicolas M. Müller, Karla Markert, Konstantin Böttinger

The recent emergence of deepfakes, computerized realistic multimedia fakes, brought the detection of manipulated and generated content to the forefront. While many machine learning models for deepfakes detection have been proposed, the human detection capabilities have remained far less explored. This is of special importance as human perception differs from machine perception and deepfakes are generally designed to fool the human. So far, this issue has only been addressed in the area of images and video. To compare the ability of humans and machines in detecting audio deepfakes, we conducted an online gamified experiment in which we asked users to discern bonda-fide audio samples from spoofed audio, generated with a variety of algorithms. 200 users competed for 8976 game rounds with an artificial intelligence (AI) algorithm trained for audio deepfake detection. With the collected data we found that the machine generally outperforms the humans in detecting audio deepfakes, but that the converse holds for a certain attack type, for which humans are still more accurate. Furthermore, we found that younger participants are on average better at detecting audio deepfakes than older participants, while IT-professionals hold no advantage over laymen. We conclude that it is important to combine human and machine knowledge in order to improve audio deepfake detection.

中文翻译：

人类对音频 Deepfakes 的感知

最近出现的深度伪造、计算机化逼真的多媒体伪造，将检测被操纵和生成的内容带到了最前沿。虽然已经提出了许多用于深度伪造检测的机器学习模型，但人类检测能力的探索仍然很少。这一点特别重要，因为人类感知与机器感知不同，而深度伪造通常旨在愚弄人类。到目前为止，这个问题只在图像和视频领域得到解决。为了比较人类和机器检测音频深度伪造的能力，我们进行了一项在线游戏化实验，在该实验中，我们要求用户从使用各种算法生成的欺骗音频中辨别出真实的音频样本。200 名用户使用针对音频深度伪造检测训练的人工智能 (AI) 算法进行了 8976 轮比赛。通过收集到的数据，我们发现机器在检测音频深度伪造方面的表现通常优于人类，但相反的情况适用于某种攻击类型，人类仍然更准确。此外，我们发现年轻的参与者平均比年长的参与者更擅长检测音频深度伪造，而 IT 专业人员与外行相比没有优势。我们得出的结论是，结合人类和机器知识以改进音频深度伪造检测非常重要。此外，我们发现年轻的参与者平均比年长的参与者更擅长检测音频深度伪造，而 IT 专业人员与外行相比没有优势。我们得出的结论是，结合人类和机器知识以改进音频深度伪造检测非常重要。此外，我们发现年轻的参与者平均比年长的参与者更擅长检测音频深度伪造，而 IT 专业人员与外行相比没有优势。我们得出的结论是，结合人类和机器知识以改进音频深度伪造检测非常重要。

更新日期：2021-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>