当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02997 Jinshu Chen, Qihui Xu, Qi Kang, MengChu Zhou
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02997 Jinshu Chen, Qihui Xu, Qi Kang, MengChu Zhou
In most interactive image generation tasks, given regions of interest (ROI)
by users, the generated results are expected to have adequate diversities in
appearance while maintaining correct and reasonable structures in original
images. Such tasks become more challenging if only limited data is available.
Recently proposed generative models complete training based on only one image.
They pay much attention to the monolithic feature of the sample while ignoring
the actual semantic information of different objects inside the sample. As a
result, for ROI-based generation tasks, they may produce inappropriate samples
with excessive randomicity and without maintaining the related objects' correct
structures. To address this issue, this work introduces a
MOrphologic-structure-aware Generative Adversarial Network named MOGAN that
produces random samples with diverse appearances and reliable structures based
on only one image. For training for ROI, we propose to utilize the data coming
from the original image being augmented and bring in a novel module to
transform such augmented data into knowledge containing both structures and
appearances, thus enhancing the model's comprehension of the sample. To learn
the rest areas other than ROI, we employ binary masks to ensure the generation
isolated from ROI. Finally, we set parallel and hierarchical branches of the
mentioned learning process. Compared with other single image GAN schemes, our
approach focuses on internal features including the maintenance of rational
structures and variation on appearance. Experiments confirm a better capacity
of our model on ROI-based image generation tasks than its competitive peers.
中文翻译:
MOGAN:从单个图像的形态结构感知生成学习
在大多数交互式图像生成任务中,在给定用户感兴趣的区域(ROI)的情况下,预期生成的结果将在外观上具有足够的多样性,同时在原始图像中保持正确和合理的结构。如果只有有限的数据可用,则这些任务将变得更具挑战性。最近提出的生成模型仅基于一张图像即可完成训练。他们非常关注样本的整体功能,而忽略了样本内部不同对象的实际语义信息。结果,对于基于ROI的生成任务,它们可能会生成具有过多随机性且不维护相关对象正确结构的不适当样本。为了解决这个问题,这项工作介绍了一个名为MOGAN的形态结构感知生成对抗网络,该网络仅基于一张图像即可生成具有各种外观和可靠结构的随机样本。为了进行ROI的训练,我们建议利用来自正被扩增的原始图像的数据,并引入一个新颖的模块将这些扩增的数据转换为包含结构和外观的知识,从而增强模型对样本的理解。要了解除ROI外的其余区域,我们采用二进制掩码以确保生成与ROI隔离。最后,我们设置了提到的学习过程的并行和分层分支。与其他单图像GAN方案相比,我们的方法侧重于内部特征,包括维护合理的结构和外观变化。
更新日期:2021-03-05
中文翻译:
MOGAN:从单个图像的形态结构感知生成学习
在大多数交互式图像生成任务中,在给定用户感兴趣的区域(ROI)的情况下,预期生成的结果将在外观上具有足够的多样性,同时在原始图像中保持正确和合理的结构。如果只有有限的数据可用,则这些任务将变得更具挑战性。最近提出的生成模型仅基于一张图像即可完成训练。他们非常关注样本的整体功能,而忽略了样本内部不同对象的实际语义信息。结果,对于基于ROI的生成任务,它们可能会生成具有过多随机性且不维护相关对象正确结构的不适当样本。为了解决这个问题,这项工作介绍了一个名为MOGAN的形态结构感知生成对抗网络,该网络仅基于一张图像即可生成具有各种外观和可靠结构的随机样本。为了进行ROI的训练,我们建议利用来自正被扩增的原始图像的数据,并引入一个新颖的模块将这些扩增的数据转换为包含结构和外观的知识,从而增强模型对样本的理解。要了解除ROI外的其余区域,我们采用二进制掩码以确保生成与ROI隔离。最后,我们设置了提到的学习过程的并行和分层分支。与其他单图像GAN方案相比,我们的方法侧重于内部特征,包括维护合理的结构和外观变化。