Biosequence Similarity Search on the Mercury System

Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

doi:10.1007/s11265-007-0087-0

Praveen Krishnamurthy¹,
Jeremy Buhler¹,
Roger Chamberlain¹,
Mark Franklin¹,
Kwame Gyang¹,
Arpith Jacob¹ &
…
Joseph Lancaster¹

152 Accesses
22 Citations
3 Altmetric
Explore all metrics

Abstract

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences

SEAL: a divide-and-conquer approach for sequence alignment

Article 23 August 2015

Harini Kandadi & Ramazan Savas Aygün

An optimized FM-index library for nucleotide and amino acid search

Article Open access 31 December 2021

Tim Anderson & Travis J. Wheeler

References

S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Res., vol. 25, 1997, pp. 3389–3402.
Article Google Scholar
B. Bloom, “Space/Time Trade-Offs in Hash Coding with Allowable Errors,” Commun. ACM, vol. 13, no. 7, 1970, pp. 422–426.
Article MATH Google Scholar
J. Buhler, “Mercury BLAST Dictionaries: Analysis and Performance Measurement,” Technical Report WUCSE-2007-13, Washington University in St. Louis, 2007.
J. Buhler, U. Keich and Y. Sun, “Designing Seeds for Similarity Search in Genomic DNA,” J. Comput. Syst. Sci., vol. 70, 2005, pp. 342–363.
Article MathSciNet Google Scholar
L. Carter and M. Wegman, “Universal Classes of Hashing Functions,” J. Comput. Syst. Sci., vol. 18, 1979, pp. 143–154.
Article MathSciNet MATH Google Scholar
R. Chamberlain and R. Cytron, “Novel Techniques for Processing Unstructured Data Sets,” in Proc. of IEEE Aerospace Conf., Montana, March 2005.
R. Chamberlain and B. Shands, “Streaming Data from Disk Store to Application,” in Proc. of 3rd Int’l Workshop on Storage Network Architecture and Parallel I/Os, St. Louis, MO, September 2005, pp. 17–23.
R. Chamberlain, B. Shands and J. White, “Achieving Real Data Throughput for an FPGA Co-Processor on Commodity Server Platforms,” in Proc. of 1st Workshop on Building Block Engine Architectures for Computers and Networks, Boston, MA, October 2004.
R.D. Chamberlain, R.K. Cytron, M.A. Franklin and R.S. Indeck, The Mercury System: Exploiting Truly Fast Hardware for Data Search,” in Proc. of Int’l Workshop on Storage Network Architecture and Parallel I/Os, pp. 65–72, September 2003.
Z.J. Czech, G. Havas and B.S. Majewski, “Perfect Hashing,” Theor. Comp. Sci., vol. 182, 1997, pp. 1–143.
Article MathSciNet MATH Google Scholar
W.J. Dally et al., “Merrimac: Supercomputing with Streams.” in Proc. of Supercomputing Conf., November 2003.
S. Dharmapurikar, P. Krishnamurthy, T. Sproull and J. Lockwood, “Deep Packet Inspection Using Parallel Bloom Filters,” IEEE Micro, vol. 24, no. 1, 2004, pp. 52–61.
Article Google Scholar
R.K. Singh et al., “BioSCAN: A Dynamically Reconfigurable Systolic Array for Biosequence Analysis,” in Proc. CERCS 96, 1996.
M. Franklin, R. Chamberlain, M. Henrichs, B. Shands and J. White, “An Architecture for Fast Processing of Large Unstructured Data Sets,” in Proc. of the 22nd Int’l Conf. on Computer Design, October 2004, pp. 280–287.
T. Hagerup, P.B. Miltersen and R. Pagh, “Deterministic Dictionaries,” J. Algorithms, vol. 41, 2001, pp. 69–85.
Article MathSciNet MATH Google Scholar
J.D. Hirschberg, R. Hughley and K. Karplus, “Kestrel: A Programmable Array for Sequence Analysis,” in Proc. of IEEE International Conference on Application-Specific Systems, Architecture, and Processors, 1996, pp. 23–34.
D.T. Hoang, “Searching Genetic Databases on Splash 2,” in IEEE Workshop on FPGAs for Custom Computing Machines, 1993, pp. 185–191.
W.J. Kent, “BLAT: The BLAST-Like Alignment Tool,” Genome Res., vol. 12, 2002, pp. 656–664.
Article MathSciNet Google Scholar
G. Knowles and P. Gardner-Stephen, “DASH: Localizing Dynamic Programming for Order of Magnitude Faster, Accurate Sequence Alignment,” in Proc. of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, 2004, pp. 732–735.
G. Knowles and P. Gardner-Stephen, “A New Hardware Architecture for Genomic and Proteomic Sequence Alignment,” in Proc. of IEEE Computational Systems Bioinformatics Conf., 2004.
J. Lancaster, J. Buhler and R.D. Chamberlain, “Acceleration of Ungapped Extension in Mercury BLAST.” in Proc. of the 7th Workshop on Media and Streaming Processors, November 2005.
D. Lavenier, S. Guytant, S. Derrien and S. Rubin, “A Reconfigurable Parallel Disk System for Filtering Genomic Banks,” in ERSA’03, Engineering of Reconfigurable Systems and Algorithms, 2003.
M. Li, B. Ma, D. Kisman and J. Tromp, “Patternhunter II: Highly Sensitive and Fast Homology Search,” J. Bioinform. Comput. Biol., vol. 2, 2004, pp. 417–439.
Article Google Scholar
National Center for Biological Information, “Growth of GenBank,” 2002, http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
Z. Ning, A.J. Cox and J.C. Mullikin, “SSAHA: A Fast Search Method for Large DNA Databases,” Genome Res., vol. 11, 2001, pp. 1725–1729.
Article Google Scholar
N. Pappas, “Searching Biological Sequence Databases Using Distributed Adaptive Computing,” Master’s thesis, Virginia Polytechnic Institute and State University, 2003.
P.A. Pevzner and M.S. Waterman, “Multiple Filtration and Approximate Pattern Matching,” Algorithmica, vol. 13, no. 1/2, 1995, pp. 135–154.
Article MathSciNet MATH Google Scholar
M.V. Ramakrishna, E. Fu and E. Bahcekapili, “Efficient Hardware Hashing Functions for High Performance Computers,” IEEE Trans. Comput., vol. 46, 1997, pp. 1378–1381.
Article Google Scholar
E. Reidel, C. Faloutsos, G. Gibson and D. Nagle, “Active Disks for Large-Scale Data Processing,” IEEE Comput., vol. 34, no. 6, June 2001, pp. 68–74.
Article Google Scholar
T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Mol. Biol., vol. 147, no. 1, March 1981, pp. 195–197.
Article Google Scholar
R. Sprugnoli, “Perfect Hashing Functions: A Single Probe Retrieving Method for Static Sets,” Commun. ACM, vol. 20, no. 11, 1977, pp. 841–850.
Article MathSciNet MATH Google Scholar
R.E. Tarjan and A.C.C. Yao, “Storing a Sparse Table,” Commun. ACM, vol. 22, no. 11, 1979, pp. 606–611.
Article MathSciNet MATH Google Scholar
R.H. Waterston et al., “Initial Sequencing and Comparative Analysis of the Mouse Genome,” Nature, vol. 420, 2002, pp. 520–562.
Article Google Scholar
B. West, R.D. Chamberlain, R.S. Indeck and Q. Zhang, “An FPGA-Based Search Engine for Unstructured Database,” in Proc. of 2nd Workshop on Application Specific Processors, December 2003, pp. 25–32.
Y. Yamaguchi, T. Maruyama and A. Konagaya, “High Speed Homology Search with FPGAs,” in Pacific Symposium on Biocomputing, 2002, pp. 271–282.
Q. Zhang, R.D. Chamberlain, R.S. Indeck, B. West and J. White, “Massively Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching,” in Proc. Workshop on Massively Parallel Processing, April 2004.
Z. Zhang, S. Schwartz, L. Wagner and W. Miller, “A Greedy Algorithm for Aligning DNA Sequences,” J. Comput Biol., vol. 7, 2000, pp. 203–214.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Washington University, St. Louis, MO, USA
Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, Arpith Jacob & Joseph Lancaster

Authors

Praveen Krishnamurthy
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Buhler
View author publications
You can also search for this author in PubMed Google Scholar
Roger Chamberlain
View author publications
You can also search for this author in PubMed Google Scholar
Mark Franklin
View author publications
You can also search for this author in PubMed Google Scholar
Kwame Gyang
View author publications
You can also search for this author in PubMed Google Scholar
Arpith Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Lancaster
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Praveen Krishnamurthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnamurthy, P., Buhler, J., Chamberlain, R. et al. Biosequence Similarity Search on the Mercury System. J VLSI Sign Process Syst Sign Im 49, 101–121 (2007). https://doi.org/10.1007/s11265-007-0087-0

Download citation

Received: 04 March 2005
Revised: 18 April 2006
Accepted: 09 August 2006
Published: 10 July 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s11265-007-0087-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Biosequence Similarity Search on the Mercury System

Abstract

Access this article

Similar content being viewed by others

Evaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences

SEAL: a divide-and-conquer approach for sequence alignment

An optimized FM-index library for nucleotide and amino acid search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Biosequence Similarity Search on the Mercury System

Abstract

Access this article

Similar content being viewed by others

Evaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences

SEAL: a divide-and-conquer approach for sequence alignment

An optimized FM-index library for nucleotide and amino acid search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation