当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild
Speech Communication ( IF 3.2 ) Pub Date : 2023-12-24 , DOI: 10.1016/j.specom.2023.103028
Zhipeng Chen , Xinheng Wang , Lun Xie , Haijie Yuan , Hang Pan

Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U-Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high-quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results.



中文翻译:

LPIPS-AttnWav2Lip:通用音频驱动的唇形同步,用于在野外生成头部说话

研究人员对音频驱动的头显生成越来越感兴趣。头部说话的主要挑战是实现嘴唇和音频之间的视听一致性,称为唇形同步。本文提出了一种通用方法 LPIPS-AttnWav2Lip,用于基于音频重建任何说话者的面部图像。我们使用基于残差CBAM的U-Net架构来更好地编码和融合音频和视觉模态信息。此外,语义对齐模块扩展了生成器网络的感受野,以有效地获取视觉特征的空间和通道信息;将视觉特征的统计信息与音频潜向量进行匹配,实现音频内容信息对视觉信息的调整和注入。为了实现精确的唇形同步并生成逼真的高质量图像,我们的方法采用LPIPS Loss,它模拟人类对图像质量的判断并减少训练过程中不稳定的可能性。主观和客观评价结果表明,该方法在唇形同步精度和视觉质量方面均取得了优异的性能。

更新日期:2023-12-28
down
wechat
bug