当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2021-05-26 , DOI: 10.1109/tpds.2021.3084069
Marco Figueiredo , Joao Paulo Navarro , Edans F. O. Sandes , George Teodoro , Alba C. M. A. Melo

The parallelization of Smith-Waterman (SW) sequence comparison tools for long DNA sequences has been a big challenge over the years, requesting the use of several devices and sophisticated optimizations. Pruning is one of these optimizations, which can reduce considerably the amount of computation. This article proposes MultiBP, a sequence comparison solution in multiple GPUs with block pruning. Two MultiBP strategies are proposed. In static score-sharing, workload is statically distributed to the GPUs, and the best score is sent to neighbor GPUs to simulate a global view. In the dynamic strategy, execution is divided into cycles and workload is dynamically assigned, according to the GPUs processing rate. MultiBP was integrated to MASA-CUDAlign and tested in homogeneous and heterogeneous platforms, with different NVidia GPU architectures. The best results in our homogeneous and heterogeneous platforms were mostly obtained by the static and dynamic approaches, respectively. We also show that our decision module is able to select the best strategy in most cases. Finally, the comparison of the human and chimpanzee chromosomes 1 in a cluster with 512 V100 NVidia GPUs took 11 minutes and obtained the impressive rate of 82,822 GCUPS (Billions of Cells Updated per Second) which is, to our knowledge, the best performance for SW tools in GPUs.

中文翻译:

同质和异质 GPU 平台中长 DNA 序列的并行细粒度比较与修剪

多年来,用于长 DNA 序列的 Smith-Waterman (SW) 序列比较工具的并行化一直是一个巨大的挑战,需要使用多种设备和复杂的优化。修剪是这些优化之一,它可以大大减少计算量。本文提出了 MultiBP,这是一种在多 GPU 中使用块修剪的序列比较解决方案。提出了两种 MultiBP 策略。在静态分数共享中,工作负载静态分配到 GPU,并将最佳分数发送到相邻的 GPU 以模拟全局视图。在动态策略中,执行被划分为周期,工作负载根据 GPU 处理速率动态分配。MultiBP 已集成到 MASA-CUDAlign 中,并在具有不同 NVidia GPU 架构的同构和异构平台中进行测试。我们的同构和异构平台的最佳结果主要分别通过静态和动态方法获得。我们还表明,我们的决策模块能够在大多数情况下选择最佳策略。最后,在具有 512 个 V100 NVidia GPU 的集群中比较人类和黑猩猩的 1 号染色体耗时 11 分钟,获得了令人印象深刻的 82,822 GCUPS(每秒更新十亿个细胞),据我们所知,这是 SW 的最佳性能GPU 中的工具。
更新日期:2021-06-22
down
wechat
bug