当前位置: X-MOL 学术J. Synchrotron Radiat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images.
Journal of Synchrotron Radiation ( IF 2.5 ) Pub Date : 2020-07-31 , DOI: 10.1107/s1600577520008140
Raphael Ponsard 1 , Nicolas Janvier 1 , Jerome Kieffer 1 , Dominique Houzet 2 , Vincent Fristot 2
Affiliation  

The continual evolution of photon sources and high‐performance detectors drives cutting‐edge experiments that can produce very high throughput data streams and generate large data volumes that are challenging to manage and store. In these cases, efficient data transfer and processing architectures that allow online image correction, data reduction or compression become fundamental. This work investigates different technical options and methods for data placement from the detector head to the processing computing infrastructure, taking into account the particularities of modern modular high‐performance detectors. In order to compare realistic figures, the future ESRF beamline dedicated to macromolecular X‐ray crystallography, EBSL8, is taken as an example, which will use a PSI JUNGFRAU 4M detector generating up to 16 GB of data per second, operating continuously during several minutes. Although such an experiment seems possible at the target speed with the 100 Gb s−1 network cards that are currently available, the simulations generated highlight some potential bottlenecks when using a traditional software stack. An evaluation of solutions is presented that implements remote direct memory access (RDMA) over converged ethernet techniques. A synchronization mechanism is proposed between a RDMA network interface card (RNIC) and a graphics processing unit (GPU) accelerator in charge of the online data processing. The placement of the detector images onto the GPU is made to overlap with the computation carried out, potentially hiding the transfer latencies. As a proof of concept, a detector simulator and a backend GPU receiver with a rejection and compression algorithm suitable for a synchrotron serial crystallography (SSX) experiment are developed. It is concluded that the available transfer throughput from the RNIC to the GPU accelerator is at present the major bottleneck in online processing for SSX experiments.

中文翻译:

RDMA数据传输和GPU加速方法用于串行晶体图像的高通量在线处理。

光子源和高性能检测器的不断发展推动了尖端的实验,这些实验可以产生非常高的通量数据流,并产生难以管理和存储的大数据量。在这些情况下,允许在线图像校正,数据缩减或压缩的有效数据传输和处理体系结构变得至关重要。这项工作考虑了现代模块化高性能探测器的特殊性,研究了从探测器头到处理计算基础设施的数据放置的不同技术选择和方法。为了比较实际数据,以未来的专门用于高分子X射线晶体学的ESRF光束线EBSL8为例,它将使用PSI JUNGFRAU 4M检测器每秒生成高达16 GB的数据,在几分钟内连续运行。尽管以100 Gb s的目标速度进行这样的实验似乎是可能的-1当前可用的网卡,使用传统软件堆栈时生成的仿真突出显示了一些潜在的瓶颈。提出了对解决方案的评估,该解决方案通过融合以太网技术实现了远程直接内存访问(RDMA)。在RDMA网络接口卡(RNIC)和负责在线数据处理的图形处理单元(GPU)加速器之间,提出了一种同步机制。使探测器图像在GPU上的放置与执行的计算重叠,从而可能隐藏传输延迟。作为概念的证明,开发了一种检测器模拟器和具有拒绝和压缩算法的后端GPU接收器,适用于同步加速器串行晶体学(SSX)实验。
更新日期:2020-07-31
down
wechat
bug