Adversarial Generation of Continuous Images,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial Generation of Continuous Images
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-24 , DOI: arxiv-2011.12026
Ivan Skorokhodov, Savva Ignatyev, Mohamed Elhoseiny

In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) -- an MLP that predicts an RGB pixel value given its (x,y) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed architectural design improves the performance of continuous image generators by x6-40 times and reaches FID scores of 6.27 on LSUN bedroom 256x256 and 16.32 on FFHQ 1024x1024, greatly reducing the gap between continuous image GANs and pixel-based ones. To the best of our knowledge, these are the highest reported scores for an image generator, that consists entirely of fully-connected layers. Apart from that, we explore several exciting properties of INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries and strong geometric prior. The source code is available at https://github.com/universome/inr-gan

中文翻译：

连续图像的对抗生成

在大多数现有的学习系统中，图像通常被视为2D像素阵列。但是，在另一种越来越流行的范例中，二维图像表示为隐式神经表示（INR），即一种MLP，它根据（x，y）坐标预测RGB像素值。在本文中，我们提出了两种新颖的架构技术来构建基于INR的图像解码器：分解乘积调制和多尺度INR，并使用它们来构建最先进的连续图像GAN。先前使INR适应图像生成的尝试仅限于类似MNIST的数据集，并且无法扩展到复杂的现实世界数据。我们提出的架构设计将连续图像生成器的性能提高了6到40倍，在LSUN卧室256x256上FID得分为6.27，在FFHQ 1024x1024上达到16.32，大大减少了连续图像GAN与基于像素的GAN之间的差距。据我们所知，这些是图像生成器报告的最高分数，该图像生成器完全由完全连接的图层组成。除此之外，我们还探索了基于INR的解码器的一些令人兴奋的特性，例如开箱即用的超分辨率，有意义的图像空间插值，加速的低分辨率图像推断，能够在图像边界之外进行推断以及强大的几何特性。先验。源代码位于https://github.com/universome/inr-gan 如开箱即用的超分辨率，有意义的图像空间插值，对低分辨率图像的加速推断，能够在图像边界之外进行推断以及强大的几何先验能力。源代码位于https://github.com/universome/inr-gan 如开箱即用的超分辨率，有意义的图像空间插值，对低分辨率图像的加速推断，能够在图像边界之外进行推断以及强大的几何先验能力。源代码位于https://github.com/universome/inr-gan

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文