当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ACORN: Adaptive Coordinate Networks for Neural Scene Representation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02788 Julien N. P. Martel, David B. Lindell, Connor Z. Lin, Eric R. Chan, Marco Monteiro, Gordon Wetzstein
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02788 Julien N. P. Martel, David B. Lindell, Connor Z. Lin, Eric R. Chan, Marco Monteiro, Gordon Wetzstein
Neural representations have emerged as a new paradigm for applications in
rendering, imaging, geometric modeling, and simulation. Compared to traditional
representations such as meshes, point clouds, or volumes they can be flexibly
incorporated into differentiable learning-based pipelines. While recent
improvements to neural representations now make it possible to represent
signals with fine details at moderate resolutions (e.g., for images and 3D
shapes), adequately representing large-scale or complex scenes has proven a
challenge. Current neural representations fail to accurately represent images
at resolutions greater than a megapixel or 3D scenes with more than a few
hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit
network architecture and training strategy that adaptively allocates resources
during training and inference based on the local complexity of a signal of
interest. Our approach uses a multiscale block-coordinate decomposition,
similar to a quadtree or octree, that is optimized during training. The network
architecture operates in two stages: using the bulk of the network parameters,
a coordinate encoder generates a feature grid in a single forward pass. Then,
hundreds or thousands of samples within each block can be efficiently evaluated
using a lightweight feature decoder. With this hybrid implicit-explicit network
architecture, we demonstrate the first experiments that fit gigapixel images to
nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in
scale of over 1000x compared to the resolution of previously demonstrated
image-fitting experiments. Moreover, our approach is able to represent 3D
shapes significantly faster and better than previous techniques; it reduces
training times from days to hours or minutes and memory requirements by over an
order of magnitude.
中文翻译:
ACORN:用于神经场景表示的自适应坐标网络
神经表示法已经成为一种新的范例,适用于渲染,成像,几何建模和仿真。与传统表示法(例如网格,点云或体积)相比,它们可以灵活地合并到可区分的基于学习的管道中。虽然现在对神经表示的最新改进使得可以以中等分辨率(例如,对于图像和3D形状)表示具有精细细节的信号成为可能,但是事实证明,充分表示大规模或复杂场景是一项挑战。当前的神经表示无法以大于百万像素的分辨率或具有数十万个多边形的3D场景来精确表示图像。这里,我们介绍了一种新的混合隐式-显式网络体系结构和训练策略,可根据感兴趣信号的局部复杂度在训练和推理过程中自适应地分配资源。我们的方法使用类似于四叉树或八叉树的多尺度块坐标分解,该分解在训练过程中进行了优化。网络体系结构分两个阶段运行:使用大量网络参数,坐标编码器会在单个前向通过中生成特征网格。然后,可以使用轻量级特征解码器有效地评估每个块中成百上千个样本。使用这种混合的隐式-显式网络架构,我们演示了将千兆像素图像拟合到接近40 dB峰值信噪比的第一个实验。值得注意的是,与先前展示的图像拟合实验的分辨率相比,这代表了超过1000倍的比例增加。而且,我们的方法能够比以前的技术更快,更好地表示3D形状。它可以将培训时间从几天减少到几小时或几分钟,并且将内存需求减少一个数量级。
更新日期:2021-05-07
中文翻译:
ACORN:用于神经场景表示的自适应坐标网络
神经表示法已经成为一种新的范例,适用于渲染,成像,几何建模和仿真。与传统表示法(例如网格,点云或体积)相比,它们可以灵活地合并到可区分的基于学习的管道中。虽然现在对神经表示的最新改进使得可以以中等分辨率(例如,对于图像和3D形状)表示具有精细细节的信号成为可能,但是事实证明,充分表示大规模或复杂场景是一项挑战。当前的神经表示无法以大于百万像素的分辨率或具有数十万个多边形的3D场景来精确表示图像。这里,我们介绍了一种新的混合隐式-显式网络体系结构和训练策略,可根据感兴趣信号的局部复杂度在训练和推理过程中自适应地分配资源。我们的方法使用类似于四叉树或八叉树的多尺度块坐标分解,该分解在训练过程中进行了优化。网络体系结构分两个阶段运行:使用大量网络参数,坐标编码器会在单个前向通过中生成特征网格。然后,可以使用轻量级特征解码器有效地评估每个块中成百上千个样本。使用这种混合的隐式-显式网络架构,我们演示了将千兆像素图像拟合到接近40 dB峰值信噪比的第一个实验。值得注意的是,与先前展示的图像拟合实验的分辨率相比,这代表了超过1000倍的比例增加。而且,我们的方法能够比以前的技术更快,更好地表示3D形状。它可以将培训时间从几天减少到几小时或几分钟,并且将内存需求减少一个数量级。