Liquid Warping GAN With Attention: A Unified Framework for Human Image Synthesis,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Liquid Warping GAN With Attention: A Unified Framework for Human Image Synthesis
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-05-08 , DOI: 10.1109/tpami.2021.3078270
Wen Liu ₁ , Zhixin Piao ₁ , Zhi Tu ₁ , Wenhan Luo ₂ , Lin Ma ₃ , Shenghua Gao ₄

Affiliation

We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis, within a unified framework. It means that the model, once being trained, can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) to estimate the human body structure. However, they only express the position information with no ability to characterize the personalized shape of the person and model the limb rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape. It can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose an Attentional Liquid Warping GAN with Attentional Liquid Warping Block (AttLWB) that propagates the source information in both image and feature spaces to the synthesized reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method can support a more flexible warping from multiple sources. To further improve the generalization ability of the unseen source images, a one/few-shot adversarial learning is applied. In detail, it first trains a model in an extensive training set. Then, it finetunes the model by one/few-shot unseen image(s) in a self-supervised way to generate high-resolution (512×512512 \times 512 and 1024×10241024 \times 1024) results. Also, we build a new dataset, namely Impersonator (iPER) dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our methods in terms of preserving face identity, shape consistency, and clothes details. All codes and dataset are available on https://impersonator.org/work/impersonator-plus-plus.html.

中文翻译：

具有注意力的液体扭曲 GAN：人类图像合成的统一框架

我们在统一的框架内处理人类图像合成，包括人体运动模仿、外观迁移和新颖的视图合成。这意味着模型一旦经过训练，就可以用来处理所有这些任务。现有的特定任务方法主要使用2D关键点（姿势）来估计人体结构。然而，它们仅表达位置信息，无法表征人的个性化形状并对肢体旋转进行建模。在本文中，我们建议使用 3D 身体网格恢复模块来解开姿势和形状。它不仅可以模拟关节位置和旋转，还可以表征个性化的体形。为了保留源信息，例如纹理、风格、颜色和面部身份，我们提出了一种带有注意液体变形块（AttLWB）的注意液体变形 GAN，它将图像和特征空间中的源信息传播到合成参考。具体来说，通过去噪卷积自动编码器提取源特征，以很好地表征源身份。此外，我们提出的方法可以支持来自多个源的更灵活的变形。为了进一步提高未见过的源图像的泛化能力，应用了一次/几次对抗性学习。具体来说，它首先在广泛的训练集中训练模型。然后，它以自我监督的方式通过一张/几张未见过的图像对模型进行微调，以生成高分辨率（512×512512×512和1024×10241024×1024）结果。此外，我们构建了一个新的数据集，即 Impersonator (iPER) 数据集，用于评估人体运动模仿、外观迁移和新颖的视图合成。大量的实验证明了我们的方法在保留面部身份、形状一致性和衣服细节方面的有效性。所有代码和数据集均可在 https://impersonator.org/work/impersonator-plus-plus.html 上获取。

更新日期：2021-05-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11