当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clothing generation by multi-modal embedding: A compatibility matrix-regularized GAN model
Image and Vision Computing ( IF 4.7 ) Pub Date : 2021-01-06 , DOI: 10.1016/j.imavis.2021.104097
Linlin Liu , Haijun Zhang , Dongliang Zhou

Clothing compatibility learning has gained increasing research attention due to the fact that a properly coordinated outfit can represent personality and improve an individual's appearance greatly. In this paper, we propose a Compatibility Matrix-Regularized Generative Adversarial Network (CMRGAN) for compatible item generation. In particular, we utilize a multi-modal embedding to transform the image and text information of an input clothing item into a latent feature code. Sequentially, compatibility learning among latent features is performed to obtain a compatibility style space. The feature of the input image is then regularized by the style space. Finally, a compatible clothing image is generated by a decoder which is fed by the regularized features. To verify the proposed model, we train an Inception-v3 classification model to evaluate the authenticity of synthesized images, a regression scoring VGG model to measure the compatibility degree of the generated image pairs and a deep attentional multimodal similarity model to evaluate the semantic similarity between generated images and ground truth text descriptions. In order to give an objective evaluation, these models are trained based on datasets consisting of fashion data only. The results demonstrate the effectiveness of the proposed method on image-to-image translation based on compatibility space.



中文翻译:

通过多模态嵌入生成服装:兼容矩阵规则的GAN模型

服装相容性学习已得到越来越多的研究关注,因为这样的事实是,适当协调的服装可以代表个性并极大地改善个人外观。在本文中,我们提出了一个ç ompatibility中号atrix- ř egularized ģ enerativedversarial Ñetwork(CMRGAN)用于兼容项目的生成。特别是,我们利用多模式嵌入将输入的衣物的图像和文本信息转换为潜在特征代码。顺序地,执行潜在特征之间的兼容性学习以获得兼容性样式空间。然后,通过样式空间对输入图像的特征进行规则化。最后,由解码器生成兼容的服装图像,该解码器由正则化特征馈送。为了验证所提出的模型,我们训练了一个Inception-v3分类模型来评估合成图像的真实性,一个回归评分VGG模型来测量所生成图像对的兼容程度,以及一个深层关注的多模态相似性模型来评估之间的语义相似性。生成的图像和地面真实文本描述。为了给出客观的评估,这些模型是基于仅包含时装数据的数据集进行训练的。结果证明了该方法在基于兼容空间的图像到图像翻译中的有效性。

更新日期:2021-01-13
down
wechat
bug