当前位置: X-MOL 学术Comput. Graph. Forum › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Computer Graphics Forum ( IF 2.5 ) Pub Date : 2021-07-15 , DOI: 10.1111/cgf.14345
Markus Schütz 1 , Bernhard Kerbl 1 , Michael Wimmer 1
Affiliation  

In this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures. We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5× in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.

中文翻译:

使用计算着色器和顶点顺序优化渲染点云

在本文中,我们提出了几种基于计算的点云渲染方法,它们的性能比硬件流水线高出一个数量级,并且比以前的基于计算的方法实现了明显更好的帧时间。除了基本的最近点渲染之外,我们还引入了一种快速、高质量的变体来减少混叠。我们展示并评估了我们提出的具有不同优化风格的方法的几种变体,以确保它们的适用性并在一系列平台和架构上实现最佳性能,并为新的 GPU 硬件功能提供不同的支持。在我们的实验中,在 RTX 3090 上以每秒 62 至 64 帧(每秒 500 亿点,802GB/s)的速率渲染 7.96 亿点(12.7GB),而未使用级别 -细节结构。我们进一步为点云引入了优化的顶点顺序,以在强制硬件渲染的情况下将 GL_POINTS 的效率提高 5 倍。我们比较了不同的顺序,并表明 Morton 排序缓冲区对于某些视点更快,而混洗顶点缓冲区在其他视点中更快。相比之下,通过首先根据 Morton-code 进行排序并以 128 个点为批次对结果序列进行混洗来组合这两种方法会导致顶点缓冲区布局具有高渲染性能和对视点变化的低敏感性。而混洗的顶点缓冲区在其他情况下更快。相比之下,通过首先根据 Morton-code 进行排序并以 128 个点为批次对结果序列进行混洗来组合这两种方法会导致顶点缓冲区布局具有高渲染性能和对视点变化的低敏感性。而混洗的顶点缓冲区在其他情况下更快。相比之下,通过首先根据 Morton-code 进行排序并以 128 个点为批次对结果序列进行混洗来组合这两种方法会导致顶点缓冲区布局具有高渲染性能和对视点变化的低敏感性。
更新日期:2021-07-15
down
wechat
bug