skip to main content
research-article

Mercury BLASTP: Accelerating Protein Sequence Alignment

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

References

  1. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25, 17, 3389--3402.Google ScholarGoogle ScholarCross RefCross Ref
  2. Altschul, S. F. and Gish, W. 1996. Local alignment statistics. Metho. Enzymol. 266, 460--80.Google ScholarGoogle ScholarCross RefCross Ref
  3. Buhler, J. D., Lancaster, J. M., Jacob, A. C., and Chamberlain, R. D. 2007. Mercury BLASTN: Faster DNA sequence comparison using a streaming hardware architecture. In Proceedings of Reconfigurable Systems Summer Institute.Google ScholarGoogle Scholar
  4. Chamberlain, R. D. et al. 2003. The Mercury System: Exploiting truly fast hardware for data search. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chamberlain, R. D. and Shands, B. 2005. Streaming data from disk store to application. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--23.Google ScholarGoogle Scholar
  6. Dayhoff, M. O., Schwartz, R., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure 5, 345--52.Google ScholarGoogle Scholar
  7. Henikoff S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 22, 10915--10919.Google ScholarGoogle ScholarCross RefCross Ref
  8. Herbordt, M. C., Model, J., Gu, Y., Sukhwani, B., and VanCourt, T. 2006. Single pass, BLAST-like approximate string matching on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). 217--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Herbordt, M. C., Model, J., Sukhwani, B., Gu, Y., and VanCourt, T. 2007. Single pass streaming BLAST on FPGAs. Parall. Comput. 33, 10-11, 741--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hirschberg, J. D., Hughey, R., and Karplus, K. 1996. Kestrel: A programmable array for sequence analysis. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP). 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hoang, D. T. 1993. Searching genetic databases on Splash 2. In IEEE Workshop on FPGAs for Custom Computing Machines (FCCM). 185--191.Google ScholarGoogle ScholarCross RefCross Ref
  12. Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., Jacob, A., and Lancaster, J. 2007. Biosequence similarity search on the Mercury system. J. VLSI Signal Process. 49, 101--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., and Lancaster, J. 2004. Biosequence similarity search on the Mercury system. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 365--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lancaster, J., Buhler, J., and Chamberlain, R. D. 2005. Acceleration of ungapped extension in Mercury BLAST. In Proceedings of 7th Workshop on Media and Streaming Processors. 50--57.Google ScholarGoogle Scholar
  15. Lancaster, J., Buhler, J., and Chamberlain, R. D. 2008. Acceleration of ungapped extension in Mercury BLAST. Intl. J. of Embed. Sys. To appear.Google ScholarGoogle Scholar
  16. Lavenier, D., Guyetant, S., Derrien, S., and Rubini, S. 2003. A reconfigurable parallel disk system for filtering genomic banks. In Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA). 154--166.Google ScholarGoogle Scholar
  17. Lin, H., Ma, X., Chandramohan, P., Geist, A. and Samatova, N. 2005. Efficient data access for parallel BLAST. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS). 72.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., et al. 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437, 326--7.Google ScholarGoogle ScholarCross RefCross Ref
  19. McGinnis, S. and Madden, T. L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nuc. Acids Res. 32, 20--5.Google ScholarGoogle ScholarCross RefCross Ref
  20. Muriki, K., Underwood, K. D., and Sass, R. 2005. RC-BLAST: Towards a portable, cost-effective open source hardware implementation. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 196.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Portugaly, E. and Ninio, M. 2004. HMMERHEAD - accelerating HMM searches on large databases. In Proceedings of the International Conference on Research in Molecular Biology (RECOMB). 250--251.Google ScholarGoogle Scholar
  22. Rangwala, H., Lantz, E., Musselman, R., Pinnow, K., Smith, B., and Wallenfelt, B. 2005. Massively parallel BLAST for the Blue Gene/L. In High Availability and Performance Computing Workshop.Google ScholarGoogle Scholar
  23. Schaffer, A. A., Wolf, Y. I., Ponging, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. 1999. IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000--11.Google ScholarGoogle ScholarCross RefCross Ref
  24. Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sotiriades, E., Dollas, A., and Kozanitis, C. 2006. Some initial results on hardware BLAST acceleration with a reconfigurable architecture. In Proceedings of the 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Swiss Institute of Bioinformatics. 2006. Growth of Swiss-Prot. http://www.expasy.org/sprot/ relnotes/#SPstat.Google ScholarGoogle Scholar
  27. Wang, T. and Stormo, G. D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102, 17400--5.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yamaguchi, Y., Maruyama, T., and Konagaya, A. 2002. High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing. 271--282.Google ScholarGoogle Scholar

Index Terms

  1. Mercury BLASTP: Accelerating Protein Sequence Alignment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 1, Issue 2
        June 2008
        143 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/1371579
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2008
        • Accepted: 1 March 2008
        • Revised: 1 January 2008
        • Received: 1 August 2007
        Published in trets Volume 1, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader