Abstract
High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: The data captured from individual simulations are large, multidimensional, and must integrate with both simulation software and external data sites. Here, we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.
Similar content being viewed by others
References
Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387
Codd EF, Codd SB et al (1993) Providing OLAP to user-analysts: an IT mandate
Berman HM, Westbrook J et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Henrick K, Feng Z et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426-33
Simms AM, Toofanny RD, Kehl C, Benson NC, Daggett V (2008) Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations. Protein Eng Des Sel 21:369–377
Schaeffer RD, Jonsson AL, Simms AM, Daggett V (2011) Generation of a consensus protein domain dictionary. Bioinformatics 27:46–54
Simms AM, Beck DAC, Jonsson AL, Schaeffer RD, Daggett V (2011) The molecular mechanics parameter markup language (submitted for publication)
Beck DAC, Alonso DOV, Daggett V (2000–2011) in lucem molecular mechanics (ilmm)
Toofanny RD, Simms AM, Beck DAC, Daggett V (2011) Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection. BMC Bioinform 12:334
Levitt M (1983) Molecular dynamics of native protein. I. Computer simulation of trajectories. J Mol Biol 168:595–617
Levitt M, Hirshberg M, Sharon R, Daggett V (1995) Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comput Phys Commun 91:215–231
Microsoft Corporation (2007) SQL server 2008
International Organization for Standardization, International Electrotechnical Commission (2001) Information technology: database languages: SQL. Part 1, Framework (SQL/framework). Geneva
Microsoft Corporation (2010) SQL Server Books Online
Fritchey G, Dam S (2009) SQL server 2008 query performance tuning distilled. New York
IEEE Computer Society Standards Committee, IEEE Standards Board et al (1985) IEEE standard for binary floating-point arithmetic
Kehl CE, Simms AM, Toofanny RD, Daggett V (2008) Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data. Protein Eng Des Sel 21:379–386
Simms AM, Daggett V (2011) (in preparation)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Simms, A.M., Daggett, V. Protein simulation data in the relational model. J Supercomput 62, 150–173 (2012). https://doi.org/10.1007/s11227-011-0692-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0692-3