Abstract
The function of a protein is primarily determined by its structure and amino acid sequence. Many biological questions of interest rely on being able to accurately determine the group of structures to which domains of a protein belong; this can be done through alignment and comparison of protein structures. Dozens of different methods for Protein Structure Alignment (PSA) have been proposed that use a wide range of techniques. The aim of this study is to determine the ability of PSA methods to identify pairs of protein domains known to share differing levels of structural similarity, and to assess their utility for clustering domains from several different folds into known groups. We present the results of a comprehensive investigation into eighteen PSA methods, to our knowledge the largest piece of independent research on this topic. Overall, SP-AlignNS (non-sequential) was found to be the best method for classification, and among the best performing methods for clustering. Methods (where possible) were split into the algorithm used to find the optimal alignment and the score used to assess similarity. This allowed us to largely separate the algorithm from the score it maximizes and thus, to assess their effectiveness independently of each other. Surprisingly, we found that some hybrids of mismatched scores and algorithms performed better than either of the native methods at classification and, in some cases, clustering as well. It is hoped that this investigation and the accompanying discussion will be useful for researchers selecting or designing methods to align protein structures.
Similar content being viewed by others
Data Availability
All alignment data is available through Dryad at https://doi.org/10.5061/dryad.c59zw3r4v.
Code Availability
Examples of code for calculating scores is available through Dryad at https://doi.org/10.5061/dryad.c59zw3r4v.
References
Alexandrov NN (1996) SARFing the PDB. Protein Eng 9(9):727–732. https://doi.org/10.1093/protein/9.9.727
Aung Z, Tan KL (2006) MatAlign: precise protein structure comparison by matrix alignment. J Bioinform Comput Biol 4(6):1197–1216. https://doi.org/10.1142/S0219720006002417
Bellman R, Bellman R (1966) Dynamic programming. Science 153:34–37
Brown P, Pullan W, Yang Y, Zhou Y (2015) Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 32(3):370–377. https://doi.org/10.1093/bioinformatics/btv580
Carugo O, Pongor S (2002) Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison. J Mol Biol 315:887–898. https://doi.org/10.1006/jmbi.2001.5250
Collier JH, Allison L, Lesk AM, Stuckey PJ, Garcia De La Banda M, Konagurthu AS (2017) Statistical inference of protein structural alignments using information and compression. Bioinformatics 33(7):1005–1013. https://doi.org/10.1093/bioinformatics/btw757
Csaba G, Birzele F, Zimmer R (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol. https://doi.org/10.1186/1472-6807-9-23
Daniels NM, Nadimpalli S, Cowen LJ (2012) Formatt: correcting protein multiple structural alignments by incorporating sequence alignment. BMC Bioinform 13(1):259. https://doi.org/10.1186/1471-2105-13-259
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3):837. https://doi.org/10.2307/2531595
Dror O, Benyamini H, Nussinov R, Wolfson HJ (2003) Multiple structural alignment by secondary structures: algorithm and applications. Protein Sci 12:2492–2507. https://doi.org/10.1110/ps.03200603
Ebert J, Brutlag D (2006) Development and validation of a consistency based multiple structure alignment algorithm. Bioinformatics (Oxford, England) 22(9):1080–1087. https://doi.org/10.1093/bioinformatics/btl046
Fotoohiftroozabadi S, Mohamad MS, Deris S (2017) NAHAL-Flex: a numerical and alphabetical hinge detection algorithm for flexible protein structure alignment. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2017.2705080
Fox NK, Brenner SE, Chandonia JM (2014) SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309. https://doi.org/10.1093/nar/gkt1240
Gelly JC, Joseph AP, Srinivasan N, de Brevern AG (2011) iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res 39(May):W18–W23. https://doi.org/10.1093/nar/gkr333
Guda C, Lu S, Scheeff ED, Bourne PE, Shindyalov IN (2004) CE-MC: a multiple protein structure alignment server. Nucleic Acids Res. https://doi.org/10.1093/nar/gkh464
Hasegawa H, Holm L (2009) Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol. https://doi.org/10.1016/j.sbi.2009.04.003
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89(22):10915–10919. https://doi.org/10.1073/pnas.89.22.10915
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70. https://doi.org/10.2307/4615733
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1):123–138. https://doi.org/10.1006/jmbi.1993.1489
Hung K, Wang JC, Chen CW, Chuang CL, Tsai KN, Chen CM (2012) Enhancement of initial equivalency for protein structure alignment based on encoded local structures. IEEE Trans Inf Technol Biomed 16(6):1185–1192. https://doi.org/10.1109/TITB.2012.2204892
Jakuschev S, Hoffmann D (2009) A novel algorithm for macromolecular epitope matching. Algorithms 2(1):498–517. https://doi.org/10.3390/a2010498
Jung J, Lee B (2000) Protein structure alignment using environmental proftles. Protein Eng 13(8):535–543
Kaiser F, Eisold A, Bittrich S, Labudde D (2015) Fit3D: a web application for highly accurate screening of spatial residue patterns in protein structure data. Bioinformatics (Oxford, England) 32(5):792–794. https://doi.org/10.1093/bioinformatics/btv637
Kawabata T (2003) MATRAS: a program for protein 3D structure comparison. Nucleic Acids Res 31(13):3367–3369. https://doi.org/10.1093/nar/gkg581
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81. https://doi.org/10.2307/2332226
Kleywegt GJ, Jones TA (1997) Detecting folding motifs and similarities in protein structures. Methods Enzymol 277:525–545. https://doi.org/10.1016/S0076-6879(97)77029-0
Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346(4):1173–1188. https://doi.org/10.1016/j.jmb.2004.12.032
Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM (2006) MUSTANG: a multiple structural alignment algorithm. Proteins 64(3):559–574. https://doi.org/10.1002/prot.20921
Konc J, Janežič D (2010) ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26(9):1160–1168. https://doi.org/10.1093/bioinformatics/btq100
Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 60(12 I):2256–2268. https://doi.org/10.1107/S0907444904026460
Léonard S, Joseph A, Srinivasan N, Gelly JC, De Brevern A (2014) MulPBA: an efficient multiple protein structure alignment method based on a structural alphabet. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2013.787026
Liu X, Zhao YP, Zheng WM (2008) CLEMAPS: multiple alignment of protein structures based on conformational letters. Proteins: Struct Funct Genet 71(2):728–736. https://doi.org/10.1002/prot.21739
Madhusudhan MS, Webb BM, Marti-Renom MA, Eswar N, Sali A (2009) Alignment of multiple protein structures based on sequence and structure features. Protein Eng Des Sel 22(9):569–574. https://doi.org/10.1093/protein/gzp040
Malod-Dognin N, Przulj N (2014) GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity. Bioinformatics 30(9):1259–1265. https://doi.org/10.1093/bioinformatics/btu020
Martínez L, Andreani R, Martínez J, Berman H, Westbrook J, Feng Z, Gililand G, Bhat T, Weissig H, Shindyalov I, Bourne P, Holm L, Sander C, Holm L, Park J, Kolodny R, Linial N, Yang AS, Honig B, Kolodny R, Petrey D, Honig B, Onuchic J, Wolynes P, Zhang Y, Skolnick J, Zhang Y, Skolnick J, Vendruscolo M, Dobson C, Hou J, Sims G, Zhang C, Kim SH, Hou J, Jun SR, Zhang C, Kim SH, Lu F, Keles S, Wright S, Wahba G, Holm L, Sander C, Shyndialov I, Bourne P, Zhu J, Weng Z, Kedem K, Chew L, Elber R, Gerstein M, Levitt M, Subbiah S, Laurents D, Levitt M, Kleywegt G, Krissinel E, Henrick K, Krissinel E, Henrick K, Kolodny R, Koehl P, Levitt M, Needleman B, Wunsch C, Kearsley S, Kabsch W, Dennis J, Schnabel R, Nocedal J, Wright S, Andreani R, Martínez J, Martínez L, Yano F, Andreani R, Martínez J, Martínez L, Yano F, Audet C, Dennis J, Burke J, Lewis A, Overton M, Neubert KD, Bhattacharya S, Bhattacharyya C, Chandra N, Conn A, Gould N, Toint P (2007) Convergent algorithms for protein structural alignment. BMC Bioinform 8(1):306. https://doi.org/10.1186/1471-2105-8-306
Mayr G, Domingues FS, Lackner P (2007) Comparative analysis of protein structure alignments. BMC Struct Biol 7(1):50. https://doi.org/10.1186/1472-6807-7-50
Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE (2006) Tools for integrated sequence-structure analysis with UCSF Chimera. BMC Bioinform 7:339. https://doi.org/10.1186/1471-2105-7-339
Menke M, Berger B, Cowen L (2008) Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.0040010
Orengo CA, Taylor WR (1996) SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol 266:617–635. https://doi.org/10.1016/S0076-6879(96)66038-8
Ortiz AR, Strauss CE, Olmea O (2009) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11(11):2606–2621. https://doi.org/10.1110/ps.0215902
Pandit SB, Skolnick J (2008) Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinform 9(1):531. https://doi.org/10.1186/1471-2105-9-531
Pelta DA, Gonzalez JR, Moreno Vega M (2008) A simple and fast heuristic for protein structure comparison. BMC Bioinform 9:161. https://doi.org/10.1186/1471-2105-9-161
Potestio R, Aleksiev T, Pontiggia F, Cozzini S, Micheletti C (2010) ALADYN: a web server for aligning proteins by matching their large-scale motion. Nucleic Acids Res. https://doi.org/10.1093/nar/gkq293
Quintus F, Sperandio O, Grynberg J, Petitjean M, Tuffery P (2009) Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity. BMC Bioinform 10:245. https://doi.org/10.1186/1471-2105-10-245
Roach J, Sharma S, Kapustina M, Carter CW (2005) Structure alignment via delaunay tetrahedralization. Proteins: Struct Funct Genet 60(1):66–81. https://doi.org/10.1002/prot.20479
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. https://doi.org/10.1186/1471-2105-12-77
Russell RB, Barton GJ (1992) Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins: Struct Funct Genet 14(2):309–323. https://doi.org/10.1002/prot.340140216
Shapiro J, Brutlag D (2004) FoldMiner: structural motif discovery using an improved superposition algorithm. Protein Sci 13(1):278–294. https://doi.org/10.1110/ps.03239404
Shealy P, Valafar H (2012) Multiple structure alignment with msTALI. BMC Bioinform 13(1):105. https://doi.org/10.1186/1471-2105-13-105
Shi S, Chitturi B, Grishin NV (2009) ProSMoS server: a pattern-based search using interaction matrix representation of protein structures. Nucleic Acids Res. https://doi.org/10.1093/nar/gkp316
Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng Des Sel 11(9):739–747. https://doi.org/10.1093/protein/11.9.739
Slater AW, Castellanos JI, Sippl MJ, Melo F (2013) Towards the development of standardized methods for comparison, ranking and evaluation of structure alignments. Bioinformatics 29(1):47–53. https://doi.org/10.1093/bioinformatics/bts600
Sun H, Sacan A, Ferhatosmanoglu H, Wang Y (2011) Smolign: a spatial motifs based protein multiple structural alignment method. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2011.67
Vesterstrøm J, Taylor WR (2006) Flexible secondary structure based protein structure comparison applied to the detection of circular permutation. J Comput Biol 13(1):43–63. https://doi.org/10.1089/cmb.2006.13.43
Wang S (2009) CLeFAPS: fast flexible alignment of protein structures based on conformational letters. http://arxiv.org/abs/0903.0582
Wang S, Zheng WM (2008) CLePAPS: fast pair alignment of protein structures based on conformational letters. J Bioinform Comput Biol 6(2):347–366. https://doi.org/10.1142/S0219720008003461
Wang S, Peng J, Xu J (2011) Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling. Bioinformatics 27(18):2537–2545. https://doi.org/10.1093/bioinformatics/btr432
Wang S, Ma J, Peng J, Xu J (2013) Protein structure alignment beyond spatial proximity. Sci Rep 3:1448. https://doi.org/10.1038/srep01448
Ye Y, Godzik A (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. https://doi.org/10.1093/bioinformatics/btg1086
Ye Y, Godzik A (2005) Multiple flexible structure alignment using partial order graphs. Bioinformatics 21(10):2362–2369. https://doi.org/10.1093/bioinformatics/bti353
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309. https://doi.org/10.1093/nar/gki524
Zheng WM, Liu X (2005) A protein structural alphabet and its substitution matrix CLESUM. In: Priami C, Zelikovsky A (eds) Transactions on computational systems biology II. Springer, Berlin, pp 59–67
Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A (2006) Representing and comparing protein structures as paths in three-dimensional space. BMC Bioinform 7(1):460. https://doi.org/10.1186/1471-2105-7-460
Funding
J Sykes is supported through an SET Research Training Program (RTP) Stipend.
Author information
Authors and Affiliations
Contributions
JS conducted research and analysis and prepared the manuscript. BH and MC provided supervision and editorial help.
Corresponding author
Additional information
Handling editor: Arndt von Haeseler.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sykes, J., Holland, B. & Charleston, M. Benchmarking Methods of Protein Structure Alignment. J Mol Evol 88, 575–597 (2020). https://doi.org/10.1007/s00239-020-09960-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-020-09960-2