当前位置: X-MOL 学术J. Math. Imaging Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Processing Simple Geometric Attributes with Autoencoders
Journal of Mathematical Imaging and Vision ( IF 2 ) Pub Date : 2019-11-12 , DOI: 10.1007/s10851-019-00924-w
Alasdair Newson , Andrés Almansa , Yann Gousseau , Saïd Ladjal

Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and generative adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability and generalization problems. In this paper, we propose to analyze the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes: size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyze the case of images of centered disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe both the encoding process and a closed-form solution to the decoding training problem in a network without biases and shows that during training, the network indeed finds this solution. We then investigate the best regularization approaches which yield networks that generalize well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as “one-hot” vectors. We describe a handcrafted filter that achieves encoding perfectly and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. We hope that the insights given here will provide better understanding of the precise mechanisms used by generative networks and will ultimately contribute to producing more robust and generalizable networks.

中文翻译:

使用自动编码器处理简单的几何属性

图像合成是现代深度学习中的一个核心问题,许多最新的体系结构(例如自动编码器和生成对抗网络)在高度复杂的数据(例如人脸或风景图像)上产生了惊人的结果。尽管这些结果开辟了广泛的新的,先进的合成应用程序,但也严重缺乏对这些网络如何工作的理论理解。这导致了广泛的实际问题,例如训练中的困难,具有很少或没有可变性的图像采样趋势以及泛化问题。在本文中,我们建议分析最简单的生成网络(自动编码器)对两个简单的几何属性(大小和位置)进行编码和解码的能力。我们认为,为了了解更复杂的任务,首先必须了解这些网络如何处理简单属性。对于第一个属性,我们分析半径可变的居中磁盘图像的情况。我们将说明自动编码器如何将这些图像投射到最小可能的潜在空间(标量)中或从中投射出来。特别是,我们描述了在没有偏差的网络中对编码训练问题的编码过程和闭式解决方案,并表明在训练过程中,网络确实找到了这种解决方案。然后,我们研究最佳的正则化方法,这些方法会产生很好的泛化网络。对于第二个属性position,我们来看一下Dirac delta函数的编码和解码,也称为“单热”矢量。我们描述了一种手工制作的过滤器,该过滤器可以完美实现编码,并表明网络在训练过程中自然可以找到该过滤器。我们还通过实验表明,如果以适当的方式对数据集进行采样,则可以实现解码。我们希望这里给出的见解可以更好地理解生成网络使用的精确机制,并最终有助于产生更健壮和可推广的网络。
更新日期:2019-11-12
down
wechat
bug