当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02997
Jinshu Chen, Qihui Xu, Qi Kang, MengChu Zhou

In most interactive image generation tasks, given regions of interest (ROI) by users, the generated results are expected to have adequate diversities in appearance while maintaining correct and reasonable structures in original images. Such tasks become more challenging if only limited data is available. Recently proposed generative models complete training based on only one image. They pay much attention to the monolithic feature of the sample while ignoring the actual semantic information of different objects inside the sample. As a result, for ROI-based generation tasks, they may produce inappropriate samples with excessive randomicity and without maintaining the related objects' correct structures. To address this issue, this work introduces a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances and reliable structures based on only one image. For training for ROI, we propose to utilize the data coming from the original image being augmented and bring in a novel module to transform such augmented data into knowledge containing both structures and appearances, thus enhancing the model's comprehension of the sample. To learn the rest areas other than ROI, we employ binary masks to ensure the generation isolated from ROI. Finally, we set parallel and hierarchical branches of the mentioned learning process. Compared with other single image GAN schemes, our approach focuses on internal features including the maintenance of rational structures and variation on appearance. Experiments confirm a better capacity of our model on ROI-based image generation tasks than its competitive peers.

中文翻译:

MOGAN:从单个图像的形态结构感知生成学习

在大多数交互式图像生成任务中,在给定用户感兴趣的区域(ROI)的情况下,预期生成的结果将在外观上具有足够的多样性,同时在原始图像中保持正确和合理的结构。如果只有有限的数据可用,则这些任务将变得更具挑战性。最近提出的生成模型仅基于一张图像即可完成训练。他们非常关注样本的整体功能,而忽略了样本内部不同对象的实际语义信息。结果,对于基于ROI的生成任务,它们可能会生成具有过多随机性且不维护相关对象正确结构的不适当样本。为了解决这个问题,这项工作介绍了一个名为MOGAN的形态结构感知生成对抗网络,该网络仅基于一张图像即可生成具有各种外观和可靠结构的随机样本。为了进行ROI的训练,我们建议利用来自正被扩增的原始图像的数据,并引入一个新颖的模块将这些扩增的数据转换为包含结构和外观的知识,从而增强模型对样本的理解。要了解除ROI外的其余区域,我们采用二进制掩码以确保生成与ROI隔离。最后,我们设置了提到的学习过程的并行和分层分支。与其他单图像GAN方案相比,我们的方法侧重于内部特征,包括维护合理的结构和外观变化。
更新日期:2021-03-05
down
wechat
bug