Abstract
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.
- Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25, 17, 3389--3402.Google ScholarCross Ref
- Altschul, S. F. and Gish, W. 1996. Local alignment statistics. Metho. Enzymol. 266, 460--80.Google ScholarCross Ref
- Buhler, J. D., Lancaster, J. M., Jacob, A. C., and Chamberlain, R. D. 2007. Mercury BLASTN: Faster DNA sequence comparison using a streaming hardware architecture. In Proceedings of Reconfigurable Systems Summer Institute.Google Scholar
- Chamberlain, R. D. et al. 2003. The Mercury System: Exploiting truly fast hardware for data search. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--72. Google ScholarDigital Library
- Chamberlain, R. D. and Shands, B. 2005. Streaming data from disk store to application. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--23.Google Scholar
- Dayhoff, M. O., Schwartz, R., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure 5, 345--52.Google Scholar
- Henikoff S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 22, 10915--10919.Google ScholarCross Ref
- Herbordt, M. C., Model, J., Gu, Y., Sukhwani, B., and VanCourt, T. 2006. Single pass, BLAST-like approximate string matching on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). 217--26. Google ScholarDigital Library
- Herbordt, M. C., Model, J., Sukhwani, B., Gu, Y., and VanCourt, T. 2007. Single pass streaming BLAST on FPGAs. Parall. Comput. 33, 10-11, 741--756. Google ScholarDigital Library
- Hirschberg, J. D., Hughey, R., and Karplus, K. 1996. Kestrel: A programmable array for sequence analysis. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP). 25--34. Google ScholarDigital Library
- Hoang, D. T. 1993. Searching genetic databases on Splash 2. In IEEE Workshop on FPGAs for Custom Computing Machines (FCCM). 185--191.Google ScholarCross Ref
- Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., Jacob, A., and Lancaster, J. 2007. Biosequence similarity search on the Mercury system. J. VLSI Signal Process. 49, 101--121. Google ScholarDigital Library
- Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., and Lancaster, J. 2004. Biosequence similarity search on the Mercury system. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 365--375. Google ScholarDigital Library
- Lancaster, J., Buhler, J., and Chamberlain, R. D. 2005. Acceleration of ungapped extension in Mercury BLAST. In Proceedings of 7th Workshop on Media and Streaming Processors. 50--57.Google Scholar
- Lancaster, J., Buhler, J., and Chamberlain, R. D. 2008. Acceleration of ungapped extension in Mercury BLAST. Intl. J. of Embed. Sys. To appear.Google Scholar
- Lavenier, D., Guyetant, S., Derrien, S., and Rubini, S. 2003. A reconfigurable parallel disk system for filtering genomic banks. In Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA). 154--166.Google Scholar
- Lin, H., Ma, X., Chandramohan, P., Geist, A. and Samatova, N. 2005. Efficient data access for parallel BLAST. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS). 72.2. Google ScholarDigital Library
- Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., et al. 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437, 326--7.Google ScholarCross Ref
- McGinnis, S. and Madden, T. L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nuc. Acids Res. 32, 20--5.Google ScholarCross Ref
- Muriki, K., Underwood, K. D., and Sass, R. 2005. RC-BLAST: Towards a portable, cost-effective open source hardware implementation. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 196.2. Google ScholarDigital Library
- Portugaly, E. and Ninio, M. 2004. HMMERHEAD - accelerating HMM searches on large databases. In Proceedings of the International Conference on Research in Molecular Biology (RECOMB). 250--251.Google Scholar
- Rangwala, H., Lantz, E., Musselman, R., Pinnow, K., Smith, B., and Wallenfelt, B. 2005. Massively parallel BLAST for the Blue Gene/L. In High Availability and Performance Computing Workshop.Google Scholar
- Schaffer, A. A., Wolf, Y. I., Ponging, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. 1999. IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000--11.Google ScholarCross Ref
- Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.Google ScholarCross Ref
- Sotiriades, E., Dollas, A., and Kozanitis, C. 2006. Some initial results on hardware BLAST acceleration with a reconfigurable architecture. In Proceedings of the 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB). Google ScholarDigital Library
- Swiss Institute of Bioinformatics. 2006. Growth of Swiss-Prot. http://www.expasy.org/sprot/ relnotes/#SPstat.Google Scholar
- Wang, T. and Stormo, G. D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102, 17400--5.Google ScholarCross Ref
- Yamaguchi, Y., Maruyama, T., and Konagaya, A. 2002. High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing. 271--282.Google Scholar
Index Terms
- Mercury BLASTP: Accelerating Protein Sequence Alignment
Recommendations
NCBI BLASTP on High-Performance Reconfigurable Computing Systems
The BLAST sequence alignment program is a central application in bioinformatics. The de facto standard version, NCBI BLAST, uses complex heuristics that make it challenging to simultaneously achieve both high performance and exact agreement. We propose ...
Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering
ICS '10: Proceedings of the 24th ACM International Conference on SupercomputingNCBI BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. The problem is that it uses complex heuristics which make it difficult to simultaneously achieve both ...
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment
This paper presents the design and implementation of the most parameterisable field-programmable gate array (FPGA)-based skeleton for pairwise biological sequence alignment reported in the literature. The skeleton is parameterised in terms of the ...
Comments