当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pretraining Image Encoders without Reconstruction via Feature Prediction Loss
arXiv - CS - Machine Learning Pub Date : 2020-03-16 , DOI: arxiv-2003.07441
Gustav Grund Pihlgren (1), Fredrik Sandin (1), Marcus Liwicki (1) ((1) Lule\r{a} University of Technology)

This work investigates three methods for calculating loss for autoencoder-based pretraining of image encoders: The commonly used reconstruction loss, the more recently introduced deep perceptual similarity loss, and a feature prediction loss proposed here; the latter turning out to be the most efficient choice. Standard auto-encoder pretraining for deep learning tasks is done by comparing the input image and the reconstructed image. Recent work shows that predictions based on embeddings generated by image autoencoders can be improved by training with perceptual loss, i.e., by adding a loss network after the decoding step. So far the autoencoders trained with loss networks implemented an explicit comparison of the original and reconstructed images using the loss network. However, given such a loss network we show that there is no need for the time-consuming task of decoding the entire image. Instead, we propose to decode the features of the loss network, hence the name "feature prediction loss". To evaluate this method we perform experiments on three standard publicly available datasets (LunarLander-v2, STL-10, and SVHN) and compare six different procedures for training image encoders (pixel-wise, perceptual similarity, and feature prediction losses; combined with two variations of image and feature encoding/decoding). The embedding-based prediction results show that encoders trained with feature prediction loss is as good or better than those trained with the other two losses. Additionally, the encoder is significantly faster to train using feature prediction loss in comparison to the other losses. The method implementation used in this work is available online: https://github.com/guspih/Perceptual-Autoencoders

中文翻译:

通过特征预测损失无需重建的预训练图像编码器

这项工作研究了三种计算基于自动编码器的图像编码器预训练损失的方法:常用的重建损失、最近引入的深度感知相似性损失和这里提出的特征预测损失;后者被证明是最有效的选择。深度学习任务的标准自动编码器预训练是通过比较输入图像和重建图像来完成的。最近的工作表明,基于图像自动编码器生成的嵌入的预测可以通过使用感知损失进行训练来改进,即通过在解码步骤之后添加损失网络。到目前为止,使用损失网络训练的自动编码器使用损失网络实现了原始图像和重建图像的显式比较。然而,鉴于这样的损失网络,我们表明不需要解码整个图像的耗时任务。相反,我们建议解码损失网络的特征,因此称为“特征预测损失”。为了评估这种方法,我们在三个标准的公开可用数据集(LunarLander-v2、STL-10 和 SVHN)上进行了实验,并比较了训练图像编码器的六种不同程序(像素级、感知相似性和特征预测损失;结合两个图像和特征编码/解码的变化)。基于嵌入的预测结果表明,使用特征预测损失训练的编码器与使用其他两种损失训练的编码器一样好或更好。此外,与其他损失相比,编码器使用特征预测损失进行训练的速度要快得多。
更新日期:2020-07-16
down
wechat
bug