当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A lightweight BLASTP and its implementation on CUDA GPUs
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-04-07 , DOI: 10.1007/s11227-020-03267-1
Liang-Tsung Huang , Kai-Cheng Wei , Chao-Chin Wu , Chao-Yu Chen , Jian-An Wang

The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.

中文翻译:

轻量级 BLASTP 及其在 CUDA GPU 上的实现

美国国家生物技术信息中心的 BLAST 服务器平均每天接收数万次查询。但是,即使查询长度差异很大,每个查询的服务始终相同。事实上,大部分蛋白质序列的长度都小于500。另一方面,命中检测过程消耗了BLAST的大部分执行时间,其核心架构是一个查找表。基于上述原因,我们提出了一个轻量级的 BLASTP 来服务不太长的查询,并相应地提出了一个混合查询索引表。每个表条目由四个字节组成,最多可以存储三个查询位置。因此,一个序列字通常只需要一次内存获取来检索其命中信息。此外,额外的虚拟条目被嵌入到表中并与原始条目交错。没有任何命中的条目和虚拟条目都可用于缓冲溢出的查询位置。上述特性导致了一个更小的查找表,具有更高的利用率和更低的缓存未命中率。实验结果表明,轻量级 BLASTP 优于 CUDA-BLASTP,基于前两个关键阶段的加速范围从 1.82 到 3.37。
更新日期:2020-04-07
down
wechat
bug