当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-03-11 , DOI: 10.1186/s12859-020-3348-6
Jan Fostier 1
Affiliation  

BACKGROUND The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. RESULTS We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10-4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. CONCLUSIONS BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.

中文翻译:


BLAMM:基于 BLAS 的算法,用于在 CPU 和 GPU 上查找 DNA 序列中的位置权重矩阵出现情况。



背景技术长DNA序列中大量位置权重矩阵(PWM)的所有匹配的识别需要大量的计算资源,为此已经提出了许多高效但复杂的算法。结果我们提出了 BLAMM,这是一种受高性能计算技术启发的简单而高效的工具。工作负载以矩阵-矩阵乘积表示,这些乘积使用优化的 BLAS 库实现进行高效评估。该算法很容易在 CPU 和 GPU 上并行化和实现,并且具有独立于所选 p 值的运行时间。就单核性能而言,它与最先进的 PWM 匹配软件具有竞争力,同时在使用多线程时效率更高。此外,BLAMM 需要的内存可以忽略不计。例如,使用 36 核机器,可以在 13 分钟内扫描整个人类基因组的两条链,以查找 JASPAR 数据库中的 1404 个 PWM,p 值为 10-4。在双 GPU 系统上,相同的任务可以在 5 分钟内完成。结论 BLAMM 是识别大型 DNA 序列中 PWM 匹配的有效工具。其 C++ 源代码可根据 GNU 通用公共许可证版本 3 获取,网址为 https://github.com/biointec/blamm。
更新日期:2020-03-16
down
wechat
bug