Abstract
Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein–protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.
Similar content being viewed by others
Data Availability
Not applicable.
Code Availability
Not applicable.
References
Anfinsen CB (1973) Science 181(4096):223
Martìnez L (2014) J Chem Educ 91(11):1918. https://doi.org/10.1021/ed300302h.
Levinthal C (1969) Mossbauer spectroscopy in biological systems 67:22
Hooft RW, Sander C, Vriend G (1997) Bioinformatics 13(4):425
Hollingsworth SA, Karplus PA (2010) Biomol Concepts 1(3–4):271
https://www.uniprot.org/statistics/TrEMBL/, Accessed: 2021–02–03
https://www.ebi.ac.uk/uniprot/TrEMBLstats/, Accessed: 2021–02–03
https://www.rcsb.org/stats/summary/, Accessed: 2021–03–31
https://predictioncenter.org/, Accessed : 2020–12–12
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J (2019) Proteins: Structure. Function, and Bioinformatics 87(12):1011
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žìdek A, Nelson AW, Bridgland A et al (2020) Nature 577(7792):706
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žìdek A, Nelson AW, Bridgland A et al (2019) Proteins: Structure. Function, and Bioinformatics 87(12):1141
R.E.e. John Jumper, in In Fourteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstract Book), 30 November - 4 December 2020 (2020)
AlQuraishi M (2019) Cell Syst 8(4):292
J. Ingraham, A.J. Riesselman, C. Sander, D.S. Marks, in ICLR (2019)
Anfinsen CB (1971) Les Prix Nobel en 1972:103–119
Wei GW (2019) Nature Machine Intelligence 1(8):336
A. Fiser, in Computational biology (Springer, 2010), pp. 73–94
Lam SD, Das S, Sillitoe I, Orengo C (2017) Acta Crystallographica Section D: Structural Biology 73(8):628
Higgins DG, Bleasby AJ, Fuchs R (1992) Bioinformatics 8(2):189
Sievers F, Higgins DG (2014) Curr Protoc Bioinformatics 48(1):3
Edgar RC (2004) Nucleic Acids Res 32(5):1792
T. Madden, in The NCBI Handbook [Internet]. 2nd edition (National Center for Biotechnology Information (US), 2013)
Jones DT, Swindells MB (2002) Trends Biochem Sci 27(3):161
Wu S, Zhang Y (2007) Nucleic Acids Res 35(10):3375
R.D. Finn, J. Clements, S.R. Eddy, Nucleic acids research 39(suppl_2), W29 (2011)
Xu J, Li M, Kim D, Xu Y (2003) J Bioinform Comput Biol 1(01):95
Pearson WR (2016) Curr Protoc Bioinformatics 53(1):3
Pei J, Kim BH, Grishin NV (2008) Nucleic Acids Res 36(7):2295
Sutcliffe MJ, Haneef I, Carney D, Blundell T (1987) Protein Engineering. Design and Selection 1(5):377
Bates PA, Kelley LA, MacCallum RM, Sternberg MJ (2001) Proteins: Structure. Function, and Bioinformatics 45(S5):39
Guex N, Peitsch MC (1997) Electrophoresis 18(15):2714
Eswar N, Webb B, Marti-Renom MA, Madhusudhan M, Eramian D, Shen MY, Pieper U, Sali A (2006) Current protocols in bioinformatics 15(1):5
Lee MR, Tsai J, Baker D, Kollman PA (2001) J Mol Biol 313(2):417
Carnevali P, Tóth G, Toubassi G, Meshkat SN (2003) Journal of the American Chemical Society 125(47):14244
Herrmann F, Suhai S (1994) Computational Methods in Genome Research. Springer, Boston, pp 173–190
Nilges M, Clore GM, Gronenborn AM (1988) FEBS Lett 239(1):129
Dunbrack RL Jr, Cohen FE (1997) Protein Sci 6(8):1661
Dunbrack RL Jr (2002) Curr Opin Struct Biol 12(4):431
Xu G, Ma T, Du J, Wang Q, Ma J (2019) J Chem Theory Comput 15(9):5154
Wang Q, Canutescu AA, Dunbrack RL Jr (2008) Nat Protoc 3(12):1832
Huang X, Pearce R, Zhang Y (2020) Bioinformatics 36(12):3758
Xu D, Zhang Y (2011) Biophys J 101(10):2525
Bhattacharya D, Nowotny J, Cao R, Cheng J (2016) Nucleic Acids Res 44(W1):W406
D. Bhattacharya, J. Cheng, 2013 in Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, pp. 106–114
Remmert M, Biegert A, Hauser A, Söding J (2012) Nat Methods 9(2):173
Söding J, Biegert A, Lupas AN (2005) Nucleic Acids Res 33(suppl2):W244
Eddy SR (1998) Bioinformatics (Oxford, England) 14(9):755
Berman H, Henrick K, Nakamura H, Markley JL (2007) Nucleic acids research 35(1):D301
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) J Mol Biol 247(4):536
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL et al (2004) Nucleic acids research 32(1):D138
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A et al (2019) Nucleic Acids Res 47(D1):D427
Ponting CP, Schultz J, Milpetz F, Bork P (1999) Nucleic Acids Res 27(1):229
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN et al (2003) BMC Bioinformatics 4(1):1
Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH (2002) Nucleic Acids Res 30(1):281
Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015) Nat Protoc 10(6):845
Blunsom P (2004) Reinforced Plastics 48:18–19
Altschul SF, Madden TL, Scahffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic acids research 25(17):3389
Song Y, DiMaio F, Wang RYR, Kim D, Miles C, Brunette T, Thompson J, Baker D (2013) Structure 21(10):1735
Chen J, Long R, Wang X, Liu B, Chou KC (2016) Scientific Reports 6:32333
Källberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, Xu J (2012) Nat Protoc 7(8):1511
Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D (2014) Protein Sci 23(1):47
Jayaram B, Dhingra P, Mishra A, Kaushik R, Mukherjee G, Singh A, Shekhar S (2014) BMC Bioinformatics 15(16):S7
Kohler JJ, Metallo SJ, Schneider TL, Schepartz A (1999) Proc Natl Acad Sci 96(21):11735
Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y (2015) Nat Methods 12(1):7
Nikolaev DM, Shtyrov AA, Panov MS, Jamal A, Chakchir OB, Ko-chemirovsky VA, Olivucci M, Ryazantsev MN (2018) ACS Omega 3(7):7555
Ebejer JP, Hill JR, Kelm S, Shi J, Deane CM (2013) Nucleic Acids Res 41(W1):W379
Kelm S, Shi J, Deane CM (2010) Bioinformatics 26(22):2833
Almeida JG, Preto AJ, Koukos PI, Bonvin AM, Moreira IS (2017) Biochimica et Biophysica Acta (BBA)-Biomembranes 1859(10):2021
Lemer CMR, Rooman MJ, Wodak SJ (1995) Proteins: Structure. Function, and Bioinformatics 23(3):337
Rost B, Schneider R, Sander C (1997) J Mol Biol 270(3):471
Söding J, Remmert M (2011) Curr Opin Struct Biol 21(3):404
Buchan DW, Jones DT (2017) Bioinformatics 33(17):2684
Lobley A, Sadowski MI, Jones DT (2009) Bioinformatics 25(14):1761
Ghouzam Y, Postic G, de Brevern AG, Gelly JC (2015) Bioinformatics 31(23):3782
Kozma D, Tusnády GE (2015) BMC Bioinformatics 16(1):1
Leman JK, Lyskov S, Bonneau R (2017) BMC Bioinformatics 18(1):1
S. Dhingra, R. Sowdhamini, F. Cadet, B. Offmann, Biochimie (2020)
Abbass J, Nebel JC, Mansour N (2013) Biol Knowl Discov Handb. John Wiley & Sons, Inc, Hoboken, New Jersey, pp 703–24
Lee J, Freddolino PL, Zhang Y (2017) From protein structure to function with bioinformatics. Springer, Dordrecht, pp 3–35
Hagler A, Huler E, Lifson S (1974) J Am Chem Soc 96(17):5319
Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S, Weiner P (1984) J Am Chem Soc 106(3):765
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan SA, Karplus M (1983) Journal of computational chemistry 4(2):187
Skolnick J (2006) Curr Opin Struct Biol 16(2):166
Subramani A, Wei Y, Floudas CA (2012) AIChE J 58(5):1619
Case DA, Darden TA, Cheatham TE, Simmerling CL, Wang J, Duke RE, Luo R, Crowley M, Walker RC, Zhang W et al (2008) Amber 10. University of California, Tech. rep.
Jorgensen WL, Tirado-Rives J (1988) J Am Chem Soc 110(6):1657
Czaplewski C, Karczyńska A, Sieradzan AK, Liwo A (2018) Nucleic Acids Res 46(W1):W304
Schuler LD, Daura X, Van Gunsteren WF (2001) J Comput Chem 22(11):1205
Brooks BR, Brooks CL III, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S et al (2009) J Comput Chem 30(10):1545
Dickson CJ, Madej BD, Skjevik ÅA, Betz RM, Teigen K, Gould IR, Walker RC (2014) J Chem Theory Comput 10(2):865
Abbass J, Nebel JC (2020) BMC Bioinformatics 21:1
Xu D, Zhang Y (2013) Proteins: Structure. Function, and Bioinformatics 81(2):229
Trevizani R, Custóodio FL, Dos Santos KB, Dardenne LE (2017) PloS one 12(1):e0170131
K.B. Santos, R. Trevizani, F.L. Cust ́odio, L.E. Dardenne, in Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP) (The Steering Committee of The World Congress in Computer Science, Computer, 2015), p. 38
de Oliveira SH, Shi J, Deane CM (2015) PloS one 10(4):e0123998
Abbass J, Nebel JC (2015) BMC Bioinformatics 16(1):136
Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J (2008) PLoS Comput Biol 4(5):e1000083
Shah JK, Maginn EJ (2011) The Journal of chemical physics 135(13):134121
Jorgensen WL, Tirado-Rives J (1996) J Phys Chem 100(34):14508
Chen J (2018) IOP Conference Series: Earth and Environmental Science, vol 128. IOP Publishing, Bristol, p 012110
Sillitoe I, Dawson N, Thornton J, Orengo C (2015) Biochimie 119:209
Kabsch W, Sander C (1983) Biopolymers: Original Research on Biomolecules 22(12):2577
Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman KW, Renfrew PD, Smith CA, Sheffler W et al (2011) Methods in enzymology, vol 487. Elsevier, Amsterdam, pp 545–574
de Oliveira SH, Deane CM (2018) Bioinformatics 34(13):2219
Wang T, Yang Y, Zhou Y, Gong H (2017) Bioinformatics 33(5):677
Wang T, Qiao Y, Ding W, Mao W, Zhou Y, Gong H (2019) Nature Machine Intelligence 1(8):347
Das R, Baker D (2008) Annu Rev Biochem 77:363
Zhang Y, Skolnick J (2004) J Comput Chem 25(6):865
Zhou H, Skolnick J (2007) Biophys J 93(5):1510
https://www.predictioncenter.org/casp13/zscores_final.cgi?formula=gdt_ts/, Accessed: 2021–03–04.
https://www.predictioncenter.org/casp14/zscores_final.cgi?formula=gdt_ts/, Accessed: 2021–03–04.
Thachuk C, Shmygelska A, Hoos HH (2007) BMC Bioinformatics 8(1):1
https://www.predictioncenter.org/casp11/zscores_final.cgi?formula=gdt_ts/, Accessed: 2021–03–04.
https://www.predictioncenter.org/casp12/zscores_final.cgi?formula=gdt_ts/, Accessed: 2021–03–04.
Bowie JU, Eisenberg D (1994) Proc Natl Acad Sci 91(10):4436
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y (2016) Proteins: Structure. Function, and Bioinformatics 84:76
Adhikari B, Cheng J (2018) BMC Bioinformatics 19(1):22
Bhattacharya D, Cao R, Cheng J (2016) Bioinformatics 32(18):2791
Bhattacharya D, Cheng J (2015) Sci Rep 5:16332
Pietal MJ, Bujnicki JM, Kozlowski LP (2015) Bioinformatics 31(21):3499
Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R (2008) Bioinformatics 24(10):1313
Hou J, Wu T, Cao R, Cheng J (2019) Proteins: Structure. Function, and Bioinformatics 87(12):1165
Ji S, Oruc T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ (2019) PloS one 14(1):e0205214
X. Liu, F. Zhang, Z. Hou, Z. Wang, L. Mian, J. Zhang, J. Tang, (2020) arXiv preprint arXiv:2006.082181(2)
D. Hendrycks, M. Mazeika, S. Kadavath, D. Song, (2019) arXiv preprint arXiv:1906.12340
L. Jing, Y. Tian, IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
A. Elnaggar, M. Heinzinger, C. Dallago, G. Rihawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, D. Bhowmik, et al., (2020) arXiv preprint arXiv:2007.06225
Ali J, Khan R, Ahmad N, Maqsood I (2012) International Journal of Computer ScienceIssues (IJCSI) 9(5):272
Dey A (2016) International Journal of Computer Science and Information Technologies 7(3):1174
J. Zou, Y. Han, S.S. So, (2008) Artificial Neural Networks pp. 14–22
S.K. Pal, S. Mitra, (1992)
LeCun Y, Bengio Y, Hinton G (2015) Nature 521(7553):436
J. Vig, A. Madani, L.R. Varshney, C. Xiong, R. Socher, N.F. Rajani, (2020) arXiv preprint arXiv:2006.15222
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, (2019) arXiv preprint arXiv:1909.11942
R. Rao, N. Bhattacharya, N. Thomas, Y. Duan, P. Chen, J. Canny, P. Abbeel, Y. Song, 2019 Advances in Neural Information Processing Systems pp. 9689–9701
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, (2018) arXiv preprint arXiv:1810.04805
Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R (2008) IEEE/ACM Trans Comput Biol Bioinf 5(3):357
Pearson WR (2013) Curr Protoc Bioinformatics 43(1):3
Nuin PA, Wang Z, Tillier ER (2006) BMC Bioinformatics 7(1):1
J. Li, (2019) arXiv preprint arXiv:1908.00723
A. Kurniawan, W. Jatmiko, R. Hertadi, N. Habibie, in 2020 International Workshop on Big Data and Information Security (IWBIS) (IEEE, 2020), pp. 73–80
Emerson IA, Amala A (2017) Physica A 465:782
Lindahl E, Hess B, Van Der Spoel D (2001) Molecular modeling annual 7(8):306
Wang S, Sun S, Li Z, Zhang R, Xu J (2017) PLoS computational biology 13(1):e1005324
S. Targ, D. Almeida, K. Lyman, (2016) arXiv preprint arXiv:1603.08029
Z. Li, Y. Lin, A. Elofsson, Y. Yao, BioMed Research International 2020 (2020)
Yang H, Wang M, Yu Z, Zhao XM, Li A (2020) IEEE Access 8:80899
Xu J, Wang S (2019) Proteins: Structure. Function, and Bioinformatics 87(12):1069
Adhikari B (2020) Bioinformatics 36(2):470
Wang S, Sun S, Xu J (2018) Proteins: Structure. Function, and Bioinformatics 86:67
https://mybinder.org/v2/gh/dwhswenson/contact_map/master?filepath=%2Fexamples, Accessed: 2021–05–14.
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) Bioinformatics 31(7):999
Adhikari B, Hou J, Cheng J (2018) Bioinformatics 34(9):1466
Michel M, Hurtado DM, Elofsson A (2019) Bioinformatics 35(15):2677
Jones DT, Kandathil SM (2018) Bioinformatics 34(19):3308
Wang G, Dunbrack RL (2005) Nucleic acids research 33(2):W94
AlQuraishi M (2019) BMC Bioinformatics 20(1):1
Prakash A, Jeffryes M, Bateman A, Finn RD (2017) Curr Protoc Bioinformatics 60(1):3
Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R (2004) Bioinformatics 20(17):3236
Grigoriev IV, Nordberg H, Shabalov I, Aerts A, Cantor M, Goodstein D, Kuo A, Minovitsky S, Nikitin R, Ohm RA et al (2012) Nucleic Acids Res 40(D1):D26
Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, U. Consortium (2015) Bioinformatics 31(6):926
Steinegger M, Söding J (2018) Nat Commun 9(1):1
K. O’Shea, R. Nash, 2015 arXiv preprint arXiv:1511.08458
Kim P (2017) MATLAB deep learning. Springer, Berkeley, pp 121–147
W. Zaremba, I. Sutskever, O. Vinyals, (2014) arXiv preprint arXiv:1409.2329
J. Guo, Unpubl. ms., Harbin Institute of Technology 40, 1 (2013)
Werbos PJ (1990) Proc IEEE 78(10):1550
Bengio Y, Simard P, Frasconi P (1994) IEEE Trans Neural Networks 5(2):157
R. Pascanu, T. Mikolov, Y. Bengio, (2013) International conference on machine learning. PMLR, New York, 1310–1318
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, 2014 arXiv preprint arXiv:1412.3555
S. Zagoruyko, N. Komodakis, (2016) arXiv preprint arXiv:1605.07146
Hochreiter S, Schmidhuber J (1997) Neural Comput 9(8):1735
K. Cho, B. Van Merriënboer, D. Bahdanau, Y. Bengio, (2014) arXiv preprint arXiv:1409.1259
A. Graves, G. Wayne, I. Danihelka, (2014) arXiv preprint arXiv:1410.5401
D. Bahdanau, K. Cho, Y. Bengio, (2014) arXiv preprint arXiv:1409.0473
M.T. Luong, H. Pham, C.D. Manning, (2015) arXiv preprint arXiv:1508.04025
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, (2017) arXiv preprint arXiv:1706.03762
M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, L. Kaiser, (2018) arXiv preprint arXiv:1807.03819
Maiorov VN, Crippen GM (1994) J Mol Biol 235(2):625
Xu J, Zhang Y (2010) Bioinformatics 26(7):889
J. Filipovič, J. Plhák, D. Střelák, 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, New York, pp. 47–54
https://predictioncenter.org/casp13/doc/help.html/, Accessed: 2021–04–04.
Zhang L, Skolnick J (1998) Protein Sci 7(5):1201
Acknowledgements
We apologize to authors whose papers could not be cited in this review due to space constraints.
Funding
It is part of my (V. A. Jisna) PhD work at the National Institute of Technology, Calicut, India. The research is funded by the Ministry of Human Resource Development, India.
Author information
Authors and Affiliations
Contributions
VAJ reviewed the papers and wrote the manuscript. PBJ supervised the work. The final manuscript was read and approved by all authors.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jisna, V.A., Jayaraj, P.B. Protein Structure Prediction: Conventional and Deep Learning Perspectives. Protein J 40, 522–544 (2021). https://doi.org/10.1007/s10930-021-10003-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10930-021-10003-y