Towards Realistic 3D Embedding via View Alignment,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Realistic 3D Embedding via View Alignment
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.07066
Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma and Xuansong Xie

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically. On the other hand, most existing works deal with foreground objects in two-dimensional (2D) images though foreground objects in three-dimensional (3D) models are more flexible with 360-degree view freedom. This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable. The differential discriminator guides to learn geometric transformation from background images so that the composed 3D models can be aligned with the background images with realistic poses and views. The texture generator adopts a novel view encoding mechanism for generating accurate object textures for the 3D models under the estimated views. Extensive experiments over two synthesis tasks (car synthesis with KITTI and pedestrian synthesis with Cityscapes) show that VA-GAN achieves high-fidelity composition qualitatively and quantitatively as compared with state-of-the-art generation methods.

中文翻译：

通过视图对齐实现逼真的 3D 嵌入

生成对抗网络 (GAN) 的最新进展在自动图像合成方面取得了巨大成功，该合成通过将感兴趣的前景对象自动嵌入到背景图像中来生成新图像。另一方面，大多数现有作品处理二维（2D）图像中的前景对象，尽管三维（3D）模型中的前景对象更灵活，具有 360 度视图自由度。本文提出了一种创新的视图对齐 GAN (VA-GAN)，它通过将 3D 模型真实地自动嵌入到 2D 背景图像中来合成新图像。VA-GAN 由一个纹理生成器和一个差分鉴别器组成，它们是相互连接的，并且是端到端可训练的。差分鉴别器指导从背景图像中学习几何变换，以便组合的 3D 模型可以与具有真实姿势和视图的背景图像对齐。纹理生成器采用新颖的视图编码机制，为估计视图下的 3D 模型生成准确的对象纹理。对两个合成任务（KITTI 的汽车合成和 Cityscapes 的行人合成）的大量实验表明，与最先进的生成方法相比，VA-GAN 在定性和定量上都实现了高保真合成。

更新日期：2020-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>