Sketch-guided Deep Portrait Generation,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sketch-guided Deep Portrait Generation
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2020-07-06 , DOI: 10.1145/3396237
Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, Kai-Lung Hua

Generating a realistic human class image from a sketch is a unique and challenging problem considering that the human body has a complex structure that must be preserved. Additionally, input sketches often lack important details that are crucial in the generation process, hence making the problem more complicated. In this article, we present an effective method for synthesizing realistic images from human sketches. Our framework incorporates human poses corresponding to locations of key semantic components (e.g., arm, eyes, nose), seeing that its a strong prior for generating human class images. Our sketch-image synthesis framework consists of three stages: semantic keypoint extraction, coarse image generation, and image refinement. First, we extract the semantic keypoints using Part Affinity Fields (PAFs) and a convolutional autoencoder. Then, we integrate the sketch with semantic keypoints to generate a coarse image of a human. Finally, in the image refinement stage, the coarse image is enhanced by a Generative Adversarial Network (GAN) that adopts an architecture carefully designed to avoid checkerboard artifacts and to generate photo-realistic results. We evaluate our method on 6,300 sketch-image pairs and show that our proposed method generates realistic images and compares favorably against state-of-the-art image synthesis methods.

中文翻译：

草图引导的深度肖像生成

考虑到人体具有必须保留的复杂结构，从草图生成逼真的人体类图像是一个独特且具有挑战性的问题。此外，输入草图通常缺少在生成过程中至关重要的重要细节，从而使问题更加复杂。在本文中，我们提出了一种从人体草图合成逼真图像的有效方法。我们的框架结合了与关键语义组件（例如，手臂、眼睛、鼻子）的位置相对应的人体姿势，看到它是生成人类类图像的强大先验。我们的草图图像合成框架由三个阶段组成：语义关键点提取、粗图像生成和图像细化。首先，我们使用 Part Affinity Fields (PAF) 和卷积自动编码器提取语义关键点。然后，我们将草图与语义关键点集成以生成人类的粗略图像。最后，在图像细化阶段，粗图像通过生成对抗网络 (GAN) 进行增强，该网络采用精心设计的架构，以避免棋盘伪影并生成逼真的结果。我们在 6,300 个草图图像对上评估了我们的方法，并表明我们提出的方法可以生成逼真的图像，并且与最先进的图像合成方法相比具有优势。

更新日期：2020-07-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文