当前位置: X-MOL 学术Secur. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics
Security and Communication Networks ( IF 1.968 ) Pub Date : 2021-08-29 , DOI: 10.1155/2021/5524930
Yujiang Lu 1 , Yaju Liu 2 , Jianwei Fei 2 , Zhihua Xia 3
Affiliation  

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.

中文翻译:

面向人脸视频取证的Channel-Wise时空聚合技术

深度学习的最新进展,尤其是生成模型,使得在视频中合成复杂的伪造面孔变得更加容易,从而导致社交媒体对个人隐私和声誉的严重威胁。因此,非常有必要开发取证方法来区分那些伪造的视频和真实的视频。现有工作专注于探索帧级线索,但不足以利用丰富的时间信息。尽管一些方法从运动不一致的角度识别伪造,但到目前为止还没有一种有前途的时空特征融合策略。为此,我们提出了 Channel-Wise 时空聚合 (CWSA) 模块来融合连续视频帧的深层特征,而无需任何循环单元。我们的方法首先在保留一些背景的情况下裁剪面部区域,这将学习目标从操作转换为原始像素和操作像素之间的差异。一个深度卷积神经网络 (CNN)然后利用有利于保留检测有用的低级特征的跳过连接来提取帧级特征。CWSA 模块最终通过聚合帧序列的深层特征来做出真假决策。针对一系列大型面部视频操作基准的评估表明了其有效性。在 FaceForensics++、Celeb-DF 和 DeepFake 检测挑战预览这三个数据集上,所提出的方法以显着优势优于最先进的方法。
更新日期:2021-08-29
down
wechat
bug