Improved-StoryGAN for sequential images visualization,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improved-StoryGAN for sequential images visualization
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-10-19 , DOI: 10.1016/j.jvcir.2020.102956
Chunye Li , Liya Kong , Zhiping Zhou

Story visualization is a novel and challenging topic that intersects computer vision and natural language processing, which needs to generate sequential images based on a story. It is related to text-to-image generation and video generation. Apart from ensuring the quality of the results, the synthesized images of story visualization are supposed to be consistent with each other and reflect the input story. In order to improve the performance of generated sequential images, we have developed the baseline model StoryGAN. Firstly, we use Dilated Convolution in the discriminators to expand the receptive field of the convolution kernel in the feature maps, thus enhancing the quality of the generated sequential images. In addition, Weighted Activation Degree (WAD) is introduced in the discriminators to provide a robust evaluation in view of similarity between the generated images and the target story, which results in enhancement on the consistency between the generated images and the target story. Last but not least, Bi-GRU stores the historical and future information of each sentence to effectively extract the textual features. What’s more, in order to make full use of the features of the long story features, Gated Convolution is used to replace the original MLP in the Initial State Encoder to improve the consistence between the generated sequential images. Experimental results and visual sequential images demonstrate the outperformance of the model we develop, compared with the other models.

中文翻译：

改进的StoryGAN，用于顺序图像可视化

故事可视化是一个新颖且具有挑战性的主题，它与计算机视觉和自然语言处理相交，后者需要根据故事生成顺序图像。它与文本到图像生成和视频生成有关。除了确保结果的质量外，故事可视化的合成图像还应彼此一致并反映输入的故事。为了提高生成的顺序图像的性能，我们开发了基线模型StoryGAN。首先，我们在鉴别器中使用了膨胀卷积来扩展特征图中卷积核的接受域，从而提高了生成的顺序图像的质量。此外，鉴权器中引入了加权激活度（WAD），以针对生成的图像和目标故事之间的相似性提供可靠的评估，从而增强了生成的图像和目标故事之间的一致性。最后但并非最不重要的一点是，Bi-GRU存储每个句子的历史和将来信息，以有效地提取文本特征。此外，为了充分利用长篇故事特征的功能，门控卷积被用来代替初始状态编码器中的原始MLP，以改善生成的顺序图像之间的一致性。实验结果和视觉序列图像证明了我们开发的模型与其他模型相比的出色表现。从而增强了生成的图像和目标故事之间的一致性。最后但并非最不重要的一点是，Bi-GRU存储每个句子的历史和将来信息，以有效地提取文本特征。此外，为了充分利用长篇故事特征的功能，门控卷积被用来代替初始状态编码器中的原始MLP，以改善生成的顺序图像之间的一致性。实验结果和视觉序列图像证明了我们开发的模型与其他模型相比的出色表现。从而增强了生成的图像和目标故事之间的一致性。最后但并非最不重要的一点是，Bi-GRU存储每个句子的历史和将来信息，以有效地提取文本特征。此外，为了充分利用长篇故事特征的功能，门控卷积被用来代替初始状态编码器中的原始MLP，以改善生成的顺序图像之间的一致性。实验结果和视觉序列图像证明了我们开发的模型与其他模型相比的出色表现。为了充分利用长故事特征的特征，门控卷积被用来代替初始状态编码器中的原始MLP，以提高生成的顺序图像之间的一致性。实验结果和可视序列图像证明了我们开发的模型与其他模型相比的出色表现。为了充分利用长故事特征的特征，门控卷积被用来代替初始状态编码器中的原始MLP，以提高生成的顺序图像之间的一致性。实验结果和视觉序列图像证明了我们开发的模型与其他模型相比的出色表现。

更新日期：2020-10-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>