当前位置: X-MOL 学术Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Neural scene representation and rendering
Science ( IF 56.9 ) Pub Date : 2018-06-14 , DOI: 10.1126/science.aar6170
S. M. Ali Eslami 1 , Danilo Jimenez Rezende 1 , Frederic Besse 1 , Fabio Viola 1 , Ari S. Morcos 1 , Marta Garnelo 1 , Avraham Ruderman 1 , Andrei A. Rusu 1 , Ivo Danihelka 1 , Karol Gregor 1 , David P. Reichert 1 , Lars Buesing 1 , Theophane Weber 1 , Oriol Vinyals 1 , Dan Rosenbaum 1 , Neil Rabinowitz 1 , Helen King 1 , Chloe Hillier 1 , Matt Botvinick 1 , Daan Wierstra 1 , Koray Kavukcuoglu 1 , Demis Hassabis 1
Affiliation  

A scene-internalizing computer program To train a computer to “recognize” elements of a scene supplied by its visual sensors, computer scientists typically use millions of images painstakingly labeled by humans. Eslami et al. developed an artificial vision system, dubbed the Generative Query Network (GQN), that has no need for such labeled data. Instead, the GQN first uses images taken from different viewpoints and creates an abstract description of the scene, learning its essentials. Next, on the basis of this representation, the network predicts what the scene would look like from a new, arbitrary viewpoint. Science, this issue p. 1204 A computer vision system predicts how a 3D scene looks from any viewpoint after just a few 2D views from other viewpoints. Scene representation—the process of converting visual sensory data into concise descriptions—is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.

中文翻译:

神经场景表示和渲染

场景内化计算机程序 为了训练计算机“识别”由其视觉传感器提供的场景元素,计算机科学家通常会使用数百万张由人类精心标记的图像。埃斯拉米等人。开发了一种人工视觉系统,称为生成查询网络 (GQN),不需要此类标记数据。相反,GQN 首先使用从不同视角拍摄的图像并创建场景的抽象描述,了解其本质。接下来,基于这种表示,网络从一个新的、任意的角度预测场景会是什么样子。科学,这个问题 p。1204 计算机视觉系统只需从其他视点获得几个 2D 视图,即可预测 3D 场景从任何视点看的外观。场景表示——将视觉感官数据转换成简明描述的过程——是智能行为的必要条件。最近的工作表明,当提供大型标记数据集时,神经网络在这项任务中表现出色。然而,消除对人工标记的依赖仍然是一个重要的开放性问题。为此,我们引入了生成查询网络 (GQN),这是一个机器学习仅使用自己的传感器来表示场景的框架。GQN 将取自不同视点的场景的输入图像作为输入,构建内部表示,并使用此表示从先前未观察到的视点预测该场景的外观。GQN 展示了没有人工标签或领域知识的表征学习,
更新日期:2018-06-14
down
wechat
bug