Preventing DeepFake Attacks on Speaker Authentication by Dynamic Lip Movement Analysis,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Preventing DeepFake Attacks on Speaker Authentication by Dynamic Lip Movement Analysis
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 12-18-2020 , DOI: 10.1109/tifs.2020.3045937
Chen-Zhao Yang , Jun Ma , Shilin Wang , Alan Wee-Chung Liew

Recent research has demonstrated that lip-based speaker authentication systems can not only achieve good authentication performance but also guarantee liveness. However, with modern DeepFake technology, attackers can produce the talking video of a user without leaving any visually noticeable fake traces. This can seriously compromise traditional face-based or lip-based authentication systems. To defend against sophisticated DeepFake attacks, a new visual speaker authentication scheme based on the deep convolutional neural network (DCNN) is proposed in this paper. The proposed network is composed of two functional parts, namely, the Fundamental Feature Extraction network (FFE-Net) and the Representative lip feature extraction and Classification network (RC-Net). The FFE-Net provides the fundamental information for speaker authentication. As the static lip shape and lip appearance is vulnerable to DeepFake attacks, the dynamic lip movement is emphasized in the FFE-Net. The RC-Net extracts high-level lip features that discriminate against human imposters while capturing the client's talking style. A multi-task learning scheme is designed, and the proposed network is trained end-to-end. Experiments on the GRID and MOBIO datasets have demonstrated that the proposed approach is able to achieve an accurate authentication result against human imposters and is much more robust against DeepFake attacks compared to three state-of-the-art visual speaker authentication algorithms. It is also worth noting that the proposed approach does not require any prior knowledge of the DeepFake spoofing method and thus can be applied to defend against different kinds of DeepFake attacks.

中文翻译：

通过动态嘴唇运动分析防止对说话者身份验证的 DeepFake 攻击

最近的研究表明，基于嘴唇的说话人身份验证系统不仅可以实现良好的身份验证性能，而且可以保证活跃度。然而，利用现代 DeepFake 技术，攻击者可以制作用户的谈话视频，而不会留下任何视觉上明显的虚假痕迹。这可能会严重损害传统的基于面部或基于嘴唇的身份验证系统。为了防御复杂的 DeepFake 攻击，本文提出了一种基于深度卷积神经网络（DCNN）的新的视觉说话人认证方案。所提出的网络由两个功能部分组成，即基本特征提取网络（FFE-Net）和代表唇部特征提取和分类网络（RC-Net）。 FFE-Net 提供说话人身份验证的基本信息。由于静态嘴唇形状和嘴唇外观容易受到 DeepFake 攻击，因此 FFE-Net 中强调动态嘴唇运动。 RC-Net 提取高级嘴唇特征，以区分人类冒名顶替者，同时捕获客户的谈话风格。设计了多任务学习方案，并对所提出的网络进行端到端训练。在 GRID 和 MOBIO 数据集上的实验表明，所提出的方法能够针对人类冒名顶替者获得准确的身份验证结果，并且与三种最先进的视觉说话者身份验证算法相比，对于 DeepFake 攻击具有更强的鲁棒性。还值得注意的是，所提出的方法不需要任何 DeepFake 欺骗方法的先验知识，因此可以用于防御不同类型的 DeepFake 攻击。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11