Editable Free-viewpoint Video Using a Layered Neural Representation,arXiv - CS - Graphics

当前位置： X-MOL 学术 › arXiv.cs.GR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Editable Free-viewpoint Video Using a Layered Neural Representation
arXiv - CS - Graphics Pub Date : 2021-04-30 , DOI: arxiv-2104.14786
Jiakai Zhang, Xinhang Liu, Xinyi Ye, Fuqiang Zhao, Yanshun Zhang, Minye Wu, Yingliang Zhang, Lan Xu, Jingyi Yu

Generating free-viewpoint videos is critical for immersive VR/AR experience but recent neural advances still lack the editing ability to manipulate the visual perception for large dynamic scenes. To fill this gap, in this paper we propose the first approach for editable photo-realistic free-viewpoint video generation for large-scale dynamic scenes using only sparse 16 cameras. The core of our approach is a new layered neural representation, where each dynamic entity including the environment itself is formulated into a space-time coherent neural layered radiance representation called ST-NeRF. Such layered representation supports fully perception and realistic manipulation of the dynamic scene whilst still supporting a free viewing experience in a wide range. In our ST-NeRF, the dynamic entity/layer is represented as continuous functions, which achieves the disentanglement of location, deformation as well as the appearance of the dynamic entity in a continuous and self-supervised manner. We propose a scene parsing 4D label map tracking to disentangle the spatial information explicitly, and a continuous deform module to disentangle the temporal motion implicitly. An object-aware volume rendering scheme is further introduced for the re-assembling of all the neural layers. We adopt a novel layered loss and motion-aware ray sampling strategy to enable efficient training for a large dynamic scene with multiple performers, Our framework further enables a variety of editing functions, i.e., manipulating the scale and location, duplicating or retiming individual neural layers to create numerous visual effects while preserving high realism. Extensive experiments demonstrate the effectiveness of our approach to achieve high-quality, photo-realistic, and editable free-viewpoint video generation for dynamic scenes.

中文翻译：

使用分层神经表示的可编辑自由视点视频

生成自由视点视频对于沉浸式VR / AR体验至关重要，但是最近的神经技术进步仍然缺乏编辑能力来操纵大型动态场景的视觉感知。为了填补这一空白，本文提出了第一种方法，该方法仅使用稀疏的16台摄像机就可以为大型动态场景编辑可编辑的逼真的自由视点视频。我们方法的核心是一种新的分层神经表示，其中每个动态实体（包括环境本身）都被公式化为称为ST-NeRF的时空相干神经分层辐射表示。这样的分层表示支持对动态场景的完全感知和现实操纵，同时仍然支持广泛的免费观看体验。在我们的ST-NeRF中，动态实体/层表示为连续函数，从而以连续和自我监督的方式实现了位置，变形和动态实体外观的解开。我们提出了一个场景解析4D标签映射跟踪以明确地解开空间信息，并提出了一个连续变形模块来隐式解开时间运动。进一步引入了一个对象感知的体绘制方案，用于重新组装所有神经层。我们采用一种新颖的分层损失和运动感知射线采样策略，可以对具有多个执行者的大型动态场景进行有效的训练。我们的框架还可以实现多种编辑功能，例如，控制比例和位置，复制或重新设置单个神经层在保持高逼真度的同时创建多种视觉效果。

更新日期：2021-05-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文