Unpaired Person Image Generation With Semantic Parsing Transformation,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unpaired Person Image Generation With Semantic Parsing Transformation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2020-05-04 , DOI: 10.1109/tpami.2020.2992105
Sijie Song , Wei Zhang , Jiaying Liu , Zongming Guo , Tao Mei

In this paper, we tackle the problem of pose-guided person image generation with unpaired data, which is a challenging problem due to non-rigid spatial deformation. Instead of learning a fixed mapping directly between human bodies as previous methods, we propose a new pathway to decompose a single fixed mapping into two subtasks, namely, semantic parsing transformation and appearance generation. First, to simplify the learning for non-rigid deformation, a semantic generative network is developed to transform semantic parsing maps between different poses. Second, guided by semantic parsing maps, we render the foreground and background image, respectively. A foreground generative network learns to synthesize semantic-aware textures, and another background generative network learns to predict missing background regions caused by pose changes. Third, we enable pseudo-label training with unpaired data, and demonstrate that end-to-end training of the overall network further refines the semantic map prediction and final results accordingly. Moreover, our method is generalizable to other person image generation tasks defined on semantic maps, e.g., clothing texture transfer, controlled image manipulation, and virtual try-on. Experimental results on DeepFashion and Market-1501 datasets demonstrate the superiority of our method, especially in keeping better body shapes and clothing attributes, as well as rendering structure-coherent backgrounds.

中文翻译：

使用语义解析转换生成未配对的人物图像

在本文中，我们解决了使用未配对数据生成姿势引导的人物图像的问题，这是由于非刚性空间变形而导致的一个具有挑战性的问题。我们没有像以前的方法那样直接学习人体之间的固定映射，而是提出了一种新途径，将单个固定映射分解为两个子任务，即语义解析转换和外观生成。首先，为了简化非刚性变形的学习，开发了语义生成网络来转换不同姿势之间的语义解析图。其次，在语义解析图的指导下，我们分别渲染前景和背景图像。一个前景生成网络学习合成语义感知纹理，另一个背景生成网络学习预测由姿态变化引起的缺失背景区域。第三，我们使用未配对的数据启用伪标签训练，并证明整个网络的端到端训练相应地进一步细化了语义图预测和最终结果。此外，我们的方法可推广到定义在语义地图上的其他人物图像生成任务，例如，服装纹理转移、受控图像处理和虚拟试穿。在 DeepFashion 和 Market-1501 数据集上的实验结果证明了我们方法的优越性，尤其是在保持更好的体型和服装属性以及渲染结构相干背景方面。我们的方法可推广到定义在语义地图上的其他人物图像生成任务，例如服装纹理转移、受控图像处理和虚拟试穿。在 DeepFashion 和 Market-1501 数据集上的实验结果证明了我们方法的优越性，尤其是在保持更好的体型和服装属性以及渲染结构相干背景方面。我们的方法可推广到定义在语义地图上的其他人物图像生成任务，例如服装纹理转移、受控图像处理和虚拟试穿。在 DeepFashion 和 Market-1501 数据集上的实验结果证明了我们方法的优越性，尤其是在保持更好的体型和服装属性以及渲染结构相干背景方面。

更新日期：2020-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>