当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating range minimum queries with ray tracing cores
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2024-03-26 , DOI: 10.1016/j.future.2024.03.040
Enzo Meneses , Cristóbal A. Navarro , Héctor Ferrada , Felipe A. Quezada

Over the past decade, GPU technology has undergone a notable transformation, evolving from pure general-purpose computation to the integration of application-specific integrated circuits (ASICs), including Tensor Cores and Ray Tracing (RT) cores. While these specialized GPU cores were initially developed to enhance specific domains like AI and real-time rendering, recent research has successfully harnessed their capabilities to expedite other tasks traditionally reliant on conventional GPU computing. One GPU task that is still yet to find its way into RT cores is the processing of range minimum queries (RMQs) in parallel, which is fundamental in fields such as information retrieval or pattern matching, among others. In this context, accelerating RMQs with RT cores would impact many of the applications that heavily rely on this task. In this work we present RTXRMQ, a new approach that can compute RMQs with RT cores. The main contribution is the proposal of a geometric solution for RMQ, where elements become triangles that are placed and shaped according to the element’s value and position in the array, respectively, such that the closest hit of a ray launched from a point given by the query parameters corresponds to the result of that query. Experimental results show that RTXRMQ is currently best suited for small query ranges relative to the input size, achieving up to and of speedup over parallel state of the art CPU and GPU approaches, respectively. For medium and large query ranges RTXRMQ is still slower than the state of the art GPU approach, but still competitive by being and faster than a state of the art CPU method running in parallel as well. Furthermore, performance scaling experiments across the latest RTX GPU architectures show that if the current RT core scaling trend continues, then RTXRMQ’s performance would scale at a higher rate than the other compared approaches, making it an attractive tool for future high performance applications that employ many batches of RMQs.

中文翻译:

使用光线追踪核心加速范围最小查询

在过去的十年中,GPU 技术经历了显着的转变,从纯粹的通用计算发展到专用集成电路 (ASIC) 的集成,包括张量核心和光线追踪 (RT) 核心。虽然这些专用 GPU 核心最初是为了增强人工智能和实时渲染等特定领域而开发的,但最近的研究已成功利用它们的功能来加速传统上依赖传统 GPU 计算的其他任务。尚未进入 RT 核心的一项 GPU 任务是并行处理范围最小查询 (RMQ),这在信息检索或模式匹配等领域至关重要。在这种情况下,使用 RT 核心加速 RMQ 将影响许多严重依赖此任务的应用程序。在这项工作中,我们提出了 RTXRMQ,这是一种可以使用 RT 内核计算 RMQ 的新方法。主要贡献是提出了 RMQ 的几何解决方案,其中元素变成三角形,分别根据元素的值和在数组中的位置进行放置和成形,使得从给定的点发射的射线的最近命中查询参数对应于该查询的结果。实验结果表明,RTXRMQ 目前最适合相对于输入大小的小查询范围,分别比最先进的 CPU 和 GPU 方法实现高达 1 和 2 倍的加速。对于中型和大型查询范围,RTXRMQ 仍然比最先进的 GPU 方法慢,但仍然比并行运行的最先进的 CPU 方法更快,因此具有竞争力。此外,跨最新 RTX GPU 架构的性能扩展实验表明,如果当前 RT 核心扩展趋势持续下去,那么 RTXRMQ 的性能将以比其他比较方法更高的速率扩展,使其成为未来高性能应用程序的有吸引力的工具,这些应用程序采用许多RMQ 批次。
更新日期:2024-03-26
down
wechat
bug