A Convolutional LSTM based Residual Network for Deepfake Video Detection,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Convolutional LSTM based Residual Network for Deepfake Video Detection
arXiv - CS - Multimedia Pub Date : 2020-09-16 , DOI: arxiv-2009.07480
Shahroz Tariq, Sangyup Lee and Simon S. Woo

In recent years, deep learning-based video manipulation methods have become widely accessible to masses. With little to no effort, people can easily learn how to generate deepfake videos with only a few victims or target images. This creates a significant social problem for everyone whose photos are publicly available on the Internet, especially on social media websites. Several deep learning-based detection methods have been developed to identify these deepfakes. However, these methods lack generalizability, because they perform well only for a specific type of deepfake method. Therefore, those methods are not transferable to detect other deepfake methods. Also, they do not take advantage of the temporal information of the video. In this paper, we addressed these limitations. We developed a Convolutional LSTM based Residual Network (CLRNet), which takes a sequence of consecutive images as an input from a video to learn the temporal information that helps in detecting unnatural looking artifacts that are present between frames of deepfake videos. We also propose a transfer learning-based approach to generalize different deepfake methods. Through rigorous experimentations using the FaceForensics++ dataset, we showed that our method outperforms five of the previously proposed state-of-the-art deepfake detection methods by better generalizing at detecting different deepfake methods using the same model.

中文翻译：

一种基于卷积 LSTM 的残差网络，用于 Deepfake 视频检测

近年来，基于深度学习的视频操作方法已被大众广泛使用。只需很少甚至不费吹灰之力，人们就可以轻松学习如何生成只有少数受害者或目标图像的 deepfake 视频。这对在互联网上公开照片的每个人，尤其是在社交媒体网站上都造成了严重的社会问题。已经开发了几种基于深度学习的检测方法来识别这些深度伪造。然而，这些方法缺乏普遍性，因为它们只对特定类型的 deepfake 方法表现良好。因此，这些方法不可用于检测其他深度伪造方法。此外，他们没有利用视频的时间信息。在本文中，我们解决了这些限制。我们开发了一个基于卷积 LSTM 的残差网络 (CLRNet)，它将一系列连续图像作为视频的输入来学习时间信息，这些信息有助于检测 Deepfake 视频帧之间存在的看起来不自然的伪影。我们还提出了一种基于迁移学习的方法来概括不同的 deepfake 方法。通过使用 FaceForensics++ 数据集的严格实验，我们表明我们的方法通过更好地泛化使用相同模型检测不同的 deepfake 方法，优于之前提出的五种最先进的 deepfake 检测方法。我们还提出了一种基于迁移学习的方法来概括不同的 Deepfake 方法。通过使用 FaceForensics++ 数据集的严格实验，我们表明我们的方法通过更好地泛化使用相同模型检测不同的 deepfake 方法，优于之前提出的五种最先进的 deepfake 检测方法。我们还提出了一种基于迁移学习的方法来概括不同的 deepfake 方法。通过使用 FaceForensics++ 数据集的严格实验，我们表明我们的方法通过更好地泛化使用相同模型检测不同的 deepfake 方法，优于之前提出的五种最先进的 deepfake 检测方法。

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>