Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search
arXiv - CS - Hardware Architecture Pub Date : 2021-09-13 , DOI: arxiv-2109.06355
Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott A. Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, Caiwen Ding

Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound & folding, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on-the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per-second processing throughput for a single query engine. On approximate search using hierarchical navigable small world (HNSW), a popular algorithm with high recall and query speed. We propose an FPGA-based graph traversal engine to utilize a high throughput register array based priority queue and fine-grained distance calculation engine to increase the processing capability. Experimental results show that the proposed FPGA-based HNSW implementation has a 103385 query per second (QPS) on the Chembl database with 0.92 recall and achieves a 35x speedup than the existing CPU implementation on average. To the best of our knowledge, our FPGA-based implementation is the first attempt to accelerate molecular similarity search algorithms on FPGA and has the highest performance among existing approaches.

中文翻译：

针对大规模分子相似性搜索优化基于 FPGA 的加速器设计

分子相似性搜索已广泛用于药物发现，以从大分子数据库中快速识别结构相似的化合物。随着化学库规模的不断扩大，人们对大规模相似性搜索的有效加速越来越感兴趣。现有工作主要集中在 CPU 和 GPU 上，以加速 Tanimoto 系数的计算，以测量不同分子指纹之间的成对相似性。在本文中，我们针对穷举和近似搜索算法提出并优化了基于 FPGA 的加速器设计。在使用 BitBound & 折叠进行穷举搜索时，我们分析了相似度截止和折叠级别与搜索加速和准确度的关系，并在 FPGA 上提出了一个可扩展的动态查询引擎，以减少资源利用率和流水线间隔。我们为单个查询引擎实现了每秒 4.5 亿化合物的处理吞吐量。关于使用分层导航小世界（HNSW）的近似搜索，这是一种具有高召回率和查询速度的流行算法。我们提出了一种基于 FPGA 的图遍历引擎，以利用基于高吞吐量寄存器阵列的优先级队列和细粒度距离计算引擎来提高处理能力。实验结果表明，所提出的基于 FPGA 的 HNSW 实现在 Chembl 数据库上每秒查询 (QPS) 为 103385，召回率为 0.92，平均比现有 CPU 实现实现了 35 倍的加速。据我们所知，我们基于 FPGA 的实现是第一次尝试在 FPGA 上加速分子相似性搜索算法，并且在现有方法中具有最高的性能。

更新日期：2021-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文