当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Geometry of Deep Generative Image Models and its Applications
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-15 , DOI: arxiv-2101.06006
Binxu Wang, Carlos R. Ponce

Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs.

中文翻译:

深度生成图像模型的几何及其应用

生成对抗网络(GAN)已经成为一种强大的无监督方法,可以对真实数据集(例如自然图像)的统计模式进行建模。这些网络经过训练,可以将其潜在空间中的随机输入映射到代表学习数据的新样本。但是,由于潜在空间的高维和生成器的非线性,因此难以理解其结构,这限制了模型的实用性。了解潜在空间需要一种为现有的现实世界图像识别输入代码的方法(反演),以及一种通过已知图像转换来识别方向的方法(可解释性)。在这里,我们使用几何框架同时解决这两个问题。我们开发了一种与架构无关的方法来计算GAN创建的图像流形的黎曼度量。度量标准的本征分解可隔离说明图像可变性不同级别的轴。对几个经过预训练的GAN进行的经验分析表明,每个位置周围的图像变化都集中在令人惊讶的几个主轴上(空间是高度各向异性的),并且在空间中的不同位置上产生这种大变化的方向相似(空间是均匀的) 。我们表明,许多顶级特征向量对应于图像空间中的可解释变换,而特征空间的相当大一部分对应于可以被压缩的次要变换。这种几何上的理解统一了先前与GAN可解释性相关的关键结果。我们表明,使用该指标可以在潜在空间中进行更有效的优化(例如 GAN反演),并有助于在无监督的情况下发现可解释轴。我们的结果表明,定义GAN图像流形的几何形状可以作为了解GAN的通用框架。
更新日期:2021-01-18
down
wechat
bug