当前位置: X-MOL 学术arXiv.cs.GR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs
arXiv - CS - Graphics Pub Date : 2021-06-25 , DOI: arxiv-2106.13416
Yuki Endo, Yoshihiro Kanamori

Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets inthree different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks.

中文翻译:

通过类和层级 VAE 使语义图像合成和编辑多样化

语义图像合成是从单个语义掩码生成逼真图像的过程。为了丰富多模态图像合成的多样性,以前的方法通过学习单个潜在空间来控制输出图像的全局外观。然而,单个潜在代码通常不足以捕获各种对象样式,因为对象外观取决于多种因素。为了处理决定对象样式的各个因素,我们提出了对变分自编码器 (VAE) 框架的类和层级扩展,该框架允许通过学习多个潜在空间在局部到全局级别灵活控制每个对象类。此外,我们通过对三个不同领域的真实和合成数据集的广泛实验证明,与最先进的方法相比,我们的方法生成的图像既合理又多样化。我们还表明,我们的方法可以在图像合成和编辑任务中实现广泛的应用。
更新日期:2021-06-28
down
wechat
bug