An embedded method: Improve the relevance of text and face image with enhanced face attributes,Signal Processing: Image Communication

当前位置： X-MOL 学术 › Signal Process. Image Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An embedded method: Improve the relevance of text and face image with enhanced face attributes
Signal Processing: Image Communication ( IF 3.5 ) Pub Date : 2022-07-18 , DOI: 10.1016/j.image.2022.116815
HongXia Wang , Hao Ke , Chun Liu

To solve the problem of low quality and lack of specific attributes in the text-to-face synthesis task, this paper proposes EFA, a general embedding method for strengthening face attributes in the text-to-image synthesis models. First, we re-encode the irregular word-level descriptions scattered in sentences to form word encoding. Then, we design the embedded local feature extraction layer for discriminators of different models to learn more specific information related to face attributes. Next, we associate the word encoding with the extracted face image feature regions to obtain face attribute domain classification loss of the real image and the generated image. Finally, in the training process, we adopt the loss function to constrain the generator and discriminator to improve their performance. This method can improve the quality of text-to-face synthesis and enhance the semantic correlation between the generated image and text description. A large number of experimental results on the newly released Multi-Modal CelebA-HQ dataset verify the validity of our method, and the experimental results are competitive compared with state of the art. Especially, our approach boosts the FID by 47.75% over AttnGAN, by 33.68% over ControlGAN, by 10.05% over DM-GAN, and by 12.52% over DF-GAN. Code is available at https://github.com/cookie-ke/EFA.

中文翻译：

一种嵌入式方法：通过增强的人脸属性提高文本和人脸图像的相关性

为了解决文本到人脸合成任务中质量低和缺乏特定属性的问题，本文提出了EFA，一种在文本到图像合成模型中加强人脸属性的通用嵌入方法。首先，我们将散布在句子中的不规则词级描述重新编码，形成词编码。然后，我们为不同模型的判别器设计了嵌入的局部特征提取层，以学习更多与人脸属性相关的具体信息。接下来，我们将词编码与提取的人脸图像特征区域相关联，以获得真实图像和生成图像的人脸属性域分类损失。最后，在训练过程中，我们采用损失函数来约束生成器和判别器以提高它们的性能。该方法可以提高文本到人脸合成的质量，增强生成的图像和文本描述之间的语义相关性。在新发布的多模态 CelebA-HQ 数据集上的大量实验结果验证了我们方法的有效性，实验结果与现有技术相比具有竞争力。特别是，我们的方法将 FID 比 AttnGAN 提高了 47.75%，比 ControlGAN 提高了 33.68%，比 DM-GAN 提高了 10.05%，比 DF-GAN 提高了 12.52%。代码可在 https://github.com/cookie-ke/EFA 获得。我们的方法将 FID 比 AttnGAN 提高了 47.75%，比 ControlGAN 提高了 33.68%，比 DM-GAN 提高了 10.05%，比 DF-GAN 提高了 12.52%。代码可在 https://github.com/cookie-ke/EFA 获得。我们的方法将 FID 比 AttnGAN 提高了 47.75%，比 ControlGAN 提高了 33.68%，比 DM-GAN 提高了 10.05%，比 DF-GAN 提高了 12.52%。代码可在 https://github.com/cookie-ke/EFA 获得。

更新日期：2022-07-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>