Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity

Zhu, Qizhi; Wang, Lihua; Dai, Ruyu; Zhang, Wei; Tang, Wending; Bin, Yannan; Wang, Zeliang; Xia, Junfeng

doi:10.1007/s12539-021-00448-1

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity

Original research article
Published: 18 June 2021

Volume 13, pages 693–702, (2021)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Qizhi Zhu^1,2^na1,
Lihua Wang^1,2^na1,
Ruyu Dai²,
Wei Zhang²,
Wending Tang²,
Yannan Bin²,
Zeliang Wang ORCID: orcid.org/0000-0001-5382-4300¹ &
…
Junfeng Xia²

526 Accesses
Explore all metrics

Abstract

Transmembrane proteins play a vital role in cell life activities. There are several techniques to determine transmembrane protein structures and X-ray crystallography is the primary methodology. However, due to the special properties of transmembrane proteins, it is still hard to determine their structures by X-ray crystallography technique. To reduce experimental consumption and improve experimental efficiency, it is of great significance to develop computational methods for predicting the crystallization propensity of transmembrane proteins. In this work, we proposed a sequence-based machine learning method, namely Prediction of TransMembrane protein Crystallization propensity (PTMC), to predict the propensity of transmembrane protein crystallization. First, we obtained several general sequence features and the specific encoded features of relative solvent accessibility and hydrophobicity. Second, feature selection was employed to filter out redundant and irrelevant features, and the optimal feature subset is composed of hydrophobicity, amino acid composition and relative solvent accessibility. Finally, we chose extreme gradient boosting by comparing with other several machine learning methods. Comparative results on the independent test set indicate that PTMC outperforms state-of-the-art sequence-based methods in terms of sensitivity, specificity, accuracy, Matthew's Correlation Coefficient (MCC) and Area Under the receiver operating characteristic Curve (AUC). In comparison with two competitors, Bcrystal and TMCrys, PTMC achieves an improvement by 0.132 and 0.179 for sensitivity, 0.014 and 0.127 for specificity, 0.037 and 0.192 for accuracy, 0.128 and 0.362 for MCC, and 0.027 and 0.125 for AUC, respectively.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crysalis: an integrated server for computational analysis and design of protein crystallization

Article Open access 24 February 2016

Huilin Wang, Liubin Feng, … Jiangning Song

Sequence–structure relationship study in all-α transmembrane proteins using an unsupervised learning approach

Article 05 June 2015

Jérémy Esque, Aurélie Urbain, … Alexandre G. de Brevern

Sequence-Based Prediction of Protein-Protein Binding Residues in Alpha-Helical Membrane Proteins

Availability of data and material

The codes and data of PTMC can be obtained from https://github.com/xialab-ahu/PTMC.

Code availability

The codes and data of PTMC can be obtained from https://github.com/xialab-ahu/PTMC.

References

Sweeney MD, Sagare AP, Zlokovic BV (2018) Blood-brain barrier breakdown in Alzheimer disease and other neurodegenerative disorders. Nat Rev Neurol 14(3):133. https://doi.org/10.1038/nrneurol.2017.188
Article CAS PubMed PubMed Central Google Scholar
Xu T-H, Yan Y, Kang Y, Jiang Y, Melcher K, Xu HE (2016) Alzheimer’s disease-associated mutations increase amyloid precursor protein resistance to γ-secretase cleavage and the Aβ42/Aβ40 ratio. Cell Discov 2(1):1–14. https://doi.org/10.1038/celldisc.2016.26
Article CAS Google Scholar
Schmit K, Michiels C (2018) TMEM proteins in cancer: a review. Front Pharmacol 9:1345. https://doi.org/10.3389/fphar.2018.01345
Article CAS PubMed PubMed Central Google Scholar
Kuhlman B, Bradley P (2019) Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 20(11):681–697. https://doi.org/10.1038/s41580-019-0163-x
Article CAS PubMed PubMed Central Google Scholar
Palmer AG, Patel DJ (2002) Kurt Wüthrich and NMR of biological macromolecules. Structure 10(12):1603–1604. https://doi.org/10.1016/s0969-2126(02)00915-2
Article CAS PubMed Google Scholar
Nogales E (2015) The development of cryo-EM into a mainstream structural biology technique. Nat Methods 13(1):24. https://doi.org/10.1038/nmeth.3694
Article CAS Google Scholar
Perman B, Anderson S, Schmidt M, Moffat K (2000) New techniques in fast time-resolved structure determination. Cell Mol Biol (Noisy-le-Grand, France) 46(5):895–913
CAS Google Scholar
Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J (2000) The Protein Data Bank and the challenge of structural genomics. Nat Struct Mol Biol 7(11s):957. https://doi.org/10.1038/80734
Article CAS Google Scholar
Overton IM, Barton GJ (2006) A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett 580(16):4005–4009. https://doi.org/10.1016/j.febslet.2006.06.015
Article CAS PubMed Google Scholar
Overton IM, Padovani G, Girolami MA, Barton GJ (2008) ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics 24(7):901–907. https://doi.org/10.1093/bioinformatics/btn055
Article CAS PubMed Google Scholar
Chen K, Kurgan L, Rahbari M (2007) Prediction of protein crystallization using collocation of amino acid pairs. Biochem Biophys Res Commun 355(3):764–769. https://doi.org/10.1016/j.bbrc.2007.02.040
Article CAS PubMed Google Scholar
Kurgan L, Razib AA, Aghakhani S, Dick S, Mizianty M, Jahandideh S (2009) CRYSTALP2: sequence-based protein crystallization propensity prediction. BMC Struct Biol 9(1):50. https://doi.org/10.1186/1472-6807-9-50
Article CAS PubMed PubMed Central Google Scholar
Wang H, Feng L, Zhang Z, Webb GI, Lin D, Song J (2016) Crysalis: an integrated server for computational analysis and design of protein crystallization. Sci Rep 6:21383. https://doi.org/10.1038/srep21383
Article CAS PubMed PubMed Central Google Scholar
Elbasir A, Moovarkumudalvan B, Kunji K, Kolatkar PR, Mall R, Bensmail H (2019) DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction. Bioinformatics 35(13):2216–2225. https://doi.org/10.1093/bioinformatics/bty953
Article CAS PubMed Google Scholar
Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33. https://doi.org/10.1093/bioinformatics/btr229
Article CAS PubMed PubMed Central Google Scholar
Jahandideh S, Mahdavi A (2012) RFCRYS: Sequence-based protein crystallization propensity prediction by means of random forest. J Theor Biol 306:115–119. https://doi.org/10.1016/j.jtbi.2012.04.028
Article CAS PubMed Google Scholar
Wang H, Wang M, Tan H, Li Y, Zhang Z, Song J (2014) PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection. PLoS ONE 9(8):e105902. https://doi.org/10.1371/journal.pone.0105902
Article CAS PubMed PubMed Central Google Scholar
Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405. https://doi.org/10.1093/bioinformatics/btm477
Article CAS PubMed Google Scholar
Jahandideh S, Jaroszewski L, Godzik A (2014) Improving the chances of successful protein structure determination with a random forest classifier. Acta Crystallogr D Biol Crystallogr 70(3):627–635. https://doi.org/10.1107/S1399004713032070
Article CAS PubMed PubMed Central Google Scholar
Elbasir A, Mall R, Kunji K, Rawi R, Islam Z, Chuang G-Y, Kolatkar PR, Bensmail H (2019) BCrystal: an interpretable sequence-based protein crystallization predictor. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz762
Article PubMed PubMed Central Google Scholar
Varga JK, Tusnády GE (2018) TMCrys: predict propensity of success for transmembrane protein crystallization. Bioinformatics 34(18):3126–3130. https://doi.org/10.1093/bioinformatics/bty342
Article CAS PubMed PubMed Central Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. ACM, pp 785–794. https://doi.org/10.1145/2939672.2939785
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. https://doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28. https://doi.org/10.1109/5254.708428
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/a:1010933404324
Article Google Scholar
Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat) 41(1):191–201. https://doi.org/10.2307/2347628
Article Google Scholar
Xia J-F, Zhao X-M, Huang D-S (2010) Predicting protein–protein interactions from protein sequences using meta predictor. Amino Acids 39(5):1595–1599. https://doi.org/10.1007/s00726-010-0588-1
Article CAS PubMed Google Scholar
Wang H, Feng L, Webb GI, Kurgan L, Song J, Lin D (2018) Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity. Brief Bioinform 19(5):838–852. https://doi.org/10.1093/bib/bbx018
Article CAS PubMed Google Scholar
Kozma D, Simon I, Tusnady GE (2012) PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Res 41(D1):D524–D529. https://doi.org/10.1093/nar/gks1169
Article CAS PubMed PubMed Central Google Scholar
Gabanyi MJ, Adams PD, Arnold K, Bordoli L, Carter LG, Flippen-Andersen J, Gifford L, Haas J, Kouranov A, McLaughlin WA (2011) The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods. J Struct Funct Genom 12(2):45–54. https://doi.org/10.1007/s10969-011-9106-2
Article CAS Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152. https://doi.org/10.1093/bioinformatics/bts565
Article CAS PubMed PubMed Central Google Scholar
Cai L, Wang L, Fu X, Xia C, Zeng X, Zou Q (2020) ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation. Brief Bioinform. https://doi.org/10.1093/bib/bbaa367
Article Google Scholar
Meher PK, Sahu TK, Banchariya A, Rao AR (2017) DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinform 18(1):1–14. https://doi.org/10.1186/s12859-017-1587-y
Article CAS Google Scholar
Li Q, Zhou W, Wang D, Wang S, Li Q (2020) Prediction of anticancer peptides using a low-dimensional feature model. Front Bioeng Biotechnol 8:892. https://doi.org/10.3389/fbioe.2020.00892
Article PubMed PubMed Central Google Scholar
Fu X, Ke L, Cai L, Chen X, Ren X, Gao M (2019) Improved prediction of cell-penetrating peptides via effective orchestrating amino acid composition feature representation. IEEE Access 7:163547–163555. https://doi.org/10.1109/ACCESS.2019.2952738
Article Google Scholar
Chou K-C (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19. https://doi.org/10.1093/bioinformatics/bth466
Article CAS PubMed Google Scholar
Chou K-C (2009) Pseudo amino acid composition and its applications in bioinformatics. Proteom Syst Biol Curr Proteom 6:262–274. https://doi.org/10.2174/157016409789973707
Article CAS Google Scholar
Cheng J, Randall AZ, Sweredoski MJ, Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33(suppl_2):W72–W76. https://doi.org/10.1093/nar/gki396
Hou J, Adhikari B, Cheng J (2018) DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34(8):1295–1303. https://doi.org/10.1093/bioinformatics/btx780
Article CAS PubMed Google Scholar
Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY (2018) PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 34(7):1092–1098. https://doi.org/10.1093/bioinformatics/btx662
Article CAS PubMed Google Scholar
Xia C-Q, Pan X, Shen H-B (2020) Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics 36(10):3018–3027. https://doi.org/10.1093/bioinformatics/btaa110
Article CAS PubMed Google Scholar
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132. https://doi.org/10.1016/0022-2836(82)90515-0
Article CAS PubMed Google Scholar
Kawashima S, Ogata H, Kanehisa M (1999) AAindex: amino acid index database. Nucleic Acids Res 27(1):368–369. https://doi.org/10.1093/nar/28.1.374
Article CAS PubMed PubMed Central Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830. https://dl.acm.org/doi/10.5555/1953048.2078195
Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng C-H, Xia J (2020) Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 21(3):970–981. https://doi.org/10.1093/bib/bbz047
Article CAS PubMed Google Scholar
Shen Z, Zhang Q, Han K, Huang D-s (2020) A deep learning model for RNA-protein binding preference prediction based on hierarchical LSTM and attention network. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2020.3007544
Article Google Scholar
Li M, Wang Y, Li F, Zhao Y, Liu M, Zhang S, Bin Y, Smith AI, Webb G, Li J (2020) A deep learning-based method for identification of bacteriophage–host interaction. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2020.3017386
Article Google Scholar
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei D-Q (2019) DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform. https://doi.org/10.1093/bib/bbz152
Article Google Scholar
Choy BC, Cater RJ, Mancia F, Pryor EE (2021) A 10-year meta-analysis of membrane protein structural biology: detergents, membrane mimetics, and structure determination techniques. Biochim Biophys Acta Biomembr 1863(3):183533. https://doi.org/10.1016/j.bbamem.2020.183533
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank the members of our laboratory for their valuable discussions.

Funding

This work was supported by the Anhui Provincial Outstanding Young Talent Support Plan (gxyq2018083), National Natural Science Foundation of China (62072003, 11835014, and U19A2064).

Author information

Qizhi Zhu and Lihua Wang contributed equally to this work.

Authors and Affiliations

School of Information Engineering, Huangshan University, Huangshan, 245041, China
Qizhi Zhu, Lihua Wang & Zeliang Wang
Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China
Qizhi Zhu, Lihua Wang, Ruyu Dai, Wei Zhang, Wending Tang, Yannan Bin & Junfeng Xia

Authors

Qizhi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruyu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wending Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yannan Bin
View author publications
You can also search for this author in PubMed Google Scholar
Zeliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zeliang Wang or Junfeng Xia.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Q., Wang, L., Dai, R. et al. Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity. Interdiscip Sci Comput Life Sci 13, 693–702 (2021). https://doi.org/10.1007/s12539-021-00448-1

Download citation

Received: 28 December 2020
Revised: 31 May 2021
Accepted: 04 June 2021
Published: 18 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s12539-021-00448-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity