用于交互式图像合成和编辑的Anycost GAN,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

用于交互式图像合成和编辑的Anycost GAN
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.03243
Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu

生成对抗网络（GAN）使得能够实现逼真的图像合成和编辑。但是，由于大型生成器（例如StyleGAN2）的高计算成本，通常需要几秒钟才能在边缘设备上看到单个编辑的结果，从而禁止了交互式用户体验。在本文中，我们从现代渲染软件中汲取了灵感，并提出了Anycost GAN用于交互式自然图像编辑的建议。我们训练Anycost GAN以支持弹性分辨率和通道，以多种速度更快地生成图像。运行完整生成器的子集所产生的输出在感觉上与完整生成器相似，从而使它们成为预览的良好代理。通过使用基于采样的多分辨率训练，自适应信道训练和生成器条件的鉴别器，与单独训练的模型相比，可以在各种配置下评估anycost生成器，同时获得更好的图像质量。此外，我们开发了新的编码器训练和潜在代码优化技术，以鼓励图像投影期间不同子生成器之间的一致性。Anycost GAN可以以各种成本预算（最多可将计算量减少10倍）执行，并适应各种硬件和延迟要求。当部署在台式机CPU和边缘设备上时，我们的模型可以以6至12倍的速度提供与感觉相似的预览，从而实现交互式图像编辑。该代码和演示是公开可用的：https://github.com/mit-han-lab/anycost-gan。我们开发了新的编码器训练和潜在代码优化技术，以鼓励图像投影期间不同子生成器之间的一致性。Anycost GAN可以以各种成本预算（最多可将计算量减少10倍）执行，并适应各种硬件和延迟要求。当部署在台式机CPU和边缘设备上时，我们的模型可以以6至12倍的速度提供与感觉相似的预览，从而实现交互式图像编辑。该代码和演示是公开可用的：https://github.com/mit-han-lab/anycost-gan。我们开发了新的编码器训练和潜在代码优化技术，以鼓励图像投影期间不同子生成器之间的一致性。Anycost GAN可以以各种成本预算（最多可将计算量减少10倍）执行，并适应各种硬件和延迟要求。当部署在台式机CPU和边缘设备上时，我们的模型可以以6至12倍的速度提供与感觉相似的预览，从而实现交互式图像编辑。该代码和演示是公开可用的：https://github.com/mit-han-lab/anycost-gan。我们的模型可以以6至12倍的速度提供与感官上相似的预览，从而可以进行交互式图像编辑。该代码和演示是公开可用的：https://github.com/mit-han-lab/anycost-gan。我们的模型可以以6至12倍的速度提供与感官上相似的预览，从而可以进行交互式图像编辑。该代码和演示是公开可用的：https://github.com/mit-han-lab/anycost-gan。

"点击查看英文标题和摘要"

Anycost GANs for Interactive Image Synthesis and Editing

Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: https://github.com/mit-han-lab/anycost-gan.

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文