Skip to main content
Log in

CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Essential proteins are assumed to be an indispensable element in sustaining normal physiological function and crucial to drug design and disease diagnosis. The discovery of essential proteins is of great importance in revealing the molecular mechanisms and biological processes. Owing to the tedious biological experiment, many numerical methods have been developed to discover key proteins by mining the features of the high throughput data. Appropriate integration of differential biological information based on protein–protein interaction (PPI) network has been proven useful in predicting essential proteins. The main intention of this research is to provide a comprehensive study and a review on identifying essential proteins by integrating multi-source data and provide guidance for researchers. Detailed analysis and comparison of current essential protein prediction algorithms have been carried out and tested on benchmark PPI networks. In addition, based on the previous method TEGS (short for the network Topology, gene Expression, Gene ontology, and Subcellular localization), we improve the performance of predicting essential proteins by incorporating known protein complex information, the gene expression profile, Gene Ontology (GO) terms information, subcellular localization information, and protein’s orthology data into the PPI network, named CEGSO. The simulation results show that CEGSO achieves more accurate and robust results than other compared methods under different test datasets with various evaluation measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Glass JI, Hutchison CA III, Smith HO, Venter JC (2009) A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol 5(1):330. https://doi.org/10.1038/msb.2009.89

    Article  PubMed  PubMed Central  Google Scholar 

  2. Hu W, Sillaots S, Lemieux S, Davison J, Kauffman S, Breton A, Linteau A, Xin C, Bowman J, Becker J, Jiang B, Roemer T (2007) Essential gene identification and drug target prioritization in Aspergillus fumigatus. Plos Pathog 3(3):e24. https://doi.org/10.1371/journal.ppat.0030024

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Cullen LM, Arndt GM (2005) Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol 83(3):217–223. https://doi.org/10.1111/j.1440-1711.2005.01332.x

    Article  CAS  PubMed  Google Scholar 

  4. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S et al (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418(6896):387–391. https://doi.org/10.1038/nature00935

    Article  CAS  PubMed  Google Scholar 

  5. Scholtens D, Gentleman R (2004) Making sense of high-throughput protein-protein interaction data. Stat Appl Genetics Mol Biol. 3(1), Article no. 39. https://doi.org/10.2202/1544-6115.1107

    Article  Google Scholar 

  6. Braun P, LaBaer J (2003) High throughput protein production for functional proteomics. Trends Biotechnol 21(9):383–388. https://doi.org/10.1016/S0167-7799(03)00189-6

    Article  CAS  PubMed  Google Scholar 

  7. Gardiner-Garden M, Littlejohn T (2001) A comparison of microarray databases. Brief Bioinf 2(2):143–158. https://doi.org/10.1093/bib/2.2.143

    Article  CAS  Google Scholar 

  8. Dong C, Jin YT, Hua HL, Wen QF, Luo S, Zheng WX, Guo FB (2020) Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief Bioinf 21(1):171–181. https://doi.org/10.1093/bib/bby116

    Article  CAS  Google Scholar 

  9. Vallabhajosyula RR, Chakravarti D, Lutfeali S, Ray A, Raval A (2009) Identifying hubs in protein interaction networks. PloS One. https://doi.org/10.1371/journal.pone.0005344

    Article  PubMed  PubMed Central  Google Scholar 

  10. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41. https://doi.org/10.2307/3033543

    Article  Google Scholar 

  11. Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2:96–103 https://scite.ai/reports/10.1155/jbb.2005.96

  12. Bonacich P (1987) Power and centrality: A family of measures. Am J Sociol 92(5):1170-1182. http://www.jstor.org/stable/2780000

  13. Stephenson K, Zelen M (1989) Rethinking centrality: methods and examples. Soc Netw 11(1):1–37. https://doi.org/10.1016/0378-8733(89)90016-6

    Article  Google Scholar 

  14. Wuchty S, Stadler PF (2003) Centers of complex networks. J Theor Biol 223(1):45–53. https://doi.org/10.1016/S0022-5193(03)00071-7

    Article  PubMed  Google Scholar 

  15. Estrada E, Rodriguez-Velazquez JA (2005) Subgraph centrality in complex networks. Phys Rev E 71(5):056103. https://doi.org/10.1103/PhysRevE.71.056103

    Article  CAS  Google Scholar 

  16. Wang J, Li M, Wang H, Pan Y (2012) Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinf 9(4):1070–1080. https://doi.org/10.1109/TCBB.2011.147

    Article  Google Scholar 

  17. Li M, Wang J, Chen X, Wang H, Pan Y (2011) A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem 35(3):143–150. https://doi.org/10.1016/j.compbiolchem.2011.04.002

    Article  CAS  PubMed  Google Scholar 

  18. Sprinzak E, Sattath S, Margalit H (2003) How reliable are experimental protein-protein interaction data? J Mol Biol 327(5):919–923. https://doi.org/10.1016/S0022-2836(03)00239-0

    Article  CAS  PubMed  Google Scholar 

  19. Kuchaiev O, Rašajski M, Higham DJ, Pržulj N (2009) Geometric de-noising of protein-protein interaction networks. PLoS Comput Biol 5(8):e1000454. https://doi.org/10.1371/journal.pcbi.1000454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Li M, Zhang H, Wang J, Pan Y (2012) A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol 6(1):15. https://doi.org/10.1186/1752-0509-6-15

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Zhang X, Xu J, Xiao W (2013) A new method for the discovery of essential proteins. PloS One 8(3):e58763. https://doi.org/10.1371/journal.pone.0058763

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Tang X, Wang J, Zhong J, Pan Y (2014) Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinf 11(2):407–418. https://doi.org/10.1109/TCBB.2013.2295318

    Article  Google Scholar 

  23. Peng W, Wang J, Cheng Y, Lu Y, Wu F, Pan Y (2015) UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans Comput Biol Bioinf 12(2):276–288. https://doi.org/10.1109/TCBB.2014.2338317

    Article  CAS  Google Scholar 

  24. Zhang W, Xu J, Li X, Zou X (2016) A new method for identifying essential proteins by measuring co-expression and functional similarity. IEEE Trans Nanobioscie 15(8):939-945. https://ieeexplore.ieee.org/document/7736043

  25. Peng W, Wang J, Wang W, Liu Q, Wu FX, Pan Y (2012) Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst Biol 6(1):87. https://doi.org/10.1186/1752-0509-6-87

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Shang X, Wang Y, Chen B (2016) Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-Seq datasets. Sci China Inf Sci. 59(7), Article no. 070106.https://doi.org/10.1007/s11432-016-5583-z

    Article  Google Scholar 

  27. Li M, Lu Y, Niu Z, Wu FX (2017) United complex centrality for identification of essential proteins from PPI networks. IEEE/ACM Trans Comput Biol Bioinf 14(2):370–380. https://doi.org/10.1109/TCBB.2015.2394487

    Article  Google Scholar 

  28. Luo J, Qi Y (2015) Identification of essential proteins based on a new combination of local interaction density and protein complexes. PloS One 10(6):e0131418. https://doi.org/10.1371/journal.pone.0131418

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Qin C, Sun Y, Dong Y (2016) A new method for identifying essential proteins based on network topology properties and protein complexes. PloS One 11(8):e0161042. https://doi.org/10.1371/journal.pone.0161042

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhang W, Xu J, Li Y, Zou X (2018) Detecting essential proteins based on network topology, gene expression data, and gene ontology information. IEEE/ACM Trans Comput Biol Bioinf 15(1):109–116. https://doi.org/10.1109/TCBB.2016.2615931

    Article  CAS  Google Scholar 

  31. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296(5568):750–752. https://doi.org/10.1126/science.1068696

    Article  CAS  PubMed  Google Scholar 

  32. Zhong J, Wang J, Peng W, Zhang Z, Pan Y (2013) Prediction of essential proteins based on gene expression programming. BMC Genom 14(S4):S7. https://doi.org/10.1186/1471-2164-14-S4-S7

    Article  Google Scholar 

  33. Li G, Li M, Wang J, Wu J, Wu FX, Pan Y (2016) Predicting essential proteins based on subcellular localization, orthology and PPI networks. BMC Bioinf. 17(8), Article no. 279. https://doi.org/10.1186/s12859-016-1115-5 

    Article  Google Scholar 

  34. Zhang X, Xiao W, Hu X (2018) Predicting essential proteins by integrating orthology, gene expressions, and PPI networks. PloS One 13(4):e0195410. https://doi.org/10.1371/journal.pone.0195410

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Fan Y, Tang X, Hu X, Wu W, Ping Q (2017) Prediction of essential proteins based on subcellular localization and gene expression correlation. BMC Bioinf. 18(13), Article no. 470.https://doi.org/10.1186/s12859-017-1876-5 

    Article  Google Scholar 

  36. Li M, Li W, Wu FX, Pan Y, Wang J (2018) Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J Theor Biol 447:65–73. https://doi.org/10.1016/j.jtbi.2018.03.029

    Article  CAS  PubMed  Google Scholar 

  37. Lei X, Zhao J, Fujita H, Zhang A (2018) Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl-Based Syst 151:136–148. https://doi.org/10.1016/j.knosys.2018.03.027

    Article  Google Scholar 

  38. Peng X, Wang J, Zhong J, Luo J, Pan Y (2015) An efficient method to identify essential proteins for different species by integrating protein subcellular localization information. In: Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on. IEEE, 2015:277–280. https://doi.org/10.1109/BIBM.2015.7359693

  39. Zhang W, Xu J, Zou X (2020) Predicting essential proteins by integrating network topology, subcellular localization information, gene expression profile and GO annotation data. IEEE/ACM Trans Comput Biol Bioinf 17(6):2053–2061. https://ieeexplore.ieee.org/document/8713910

  40. Zhao B, Zhao Y, Zhang X, Zhang Z, Zhang F, Wang L (2019) An iteration method for identifying yeast essential proteins from heterogeneous network. BMC Bioinf 20(1):355. https://doi.org/10.1186/s12859-019-2930-2

    Article  CAS  Google Scholar 

  41. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci 101(9):2658–2663. https://doi.org/10.1073/pnas.0400054101

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087

    Article  CAS  PubMed  Google Scholar 

  43. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30(1):303–305. https://doi.org/10.1093/nar/30.1.303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Zhang W, Zou X (2015) A new method for detecting protein complexes based on the three node cliques. IEEE/ACM Trans Comput Biol Bioinf 12(4):879–886. https://doi.org/10.1109/TCBB.2014.2386314

    Article  CAS  Google Scholar 

  45. Gene Ontology Consortium (2013) Gene Ontology annotations and resources. Nucleic Acids Res 41(D1):D530–D535. https://doi.org/10.1093/nar/gks1050

    Article  CAS  Google Scholar 

  46. Nepusz T, Yu H, Paccanaro A (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods 9(5):471–472. https://doi.org/10.1038/nmeth.1938

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Binder JX, Pletscher-Frankild S, Tsafou K, Stolte C, O’Donoghue SI, Schneider R, Jensen LJ (2014) COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database 2014:bau012. https://doi.org/10.1093/database/bau012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359. https://doi.org/10.1023/A:1008202821328

    Article  Google Scholar 

  49. Holman AG, Davis PJ, Foster JM, Carlow CKS, Kumar S (2009) Computational prediction of essential genes in an unculturable endosymbiotic bacterium. Wolbachia of Brugia malayi. BMC Microbiol 9(1):243. https://doi.org/10.1186/1471-2180-9-243

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning, 2006: 233–240. https://doi.org/10.1145/1143844.1143874

  51. Héberger K (2010) Sum of ranking differences compares methods or models fairly. TrAC Trends Anal Chem 29(1):101–109. https://doi.org/10.1016/j.trac.2009.09.009

    Article  CAS  Google Scholar 

  52. Kollár-Hunek K, Héberger K (2013) Method and model comparison by sum of ranking differences in cases of repeated observations (ties). Chemometr Intell Lab Syst 127:139–146. https://doi.org/10.1016/j.chemolab.2013.06.007

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation (No. 61802125, No. 61763010, No. 61862026 and No. 61862025), the Natural Science Foundation of Jiangxi Province (No. 20181BAB202006 and No. 20161BAB211022), the Guangxi “BAGUI Scholar” Program under Grant [2016] 127, and the Science and Technology Major Project of Guangxi under Grand AA18118047. The authors would like to thank Dr. John Kalivas from Chemistry Department of Idaho State University for providing simulation code of SRD.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Zhang or Chengwang Xie.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Xue, X., Xie, C. et al. CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information. Interdiscip Sci Comput Life Sci 13, 349–361 (2021). https://doi.org/10.1007/s12539-021-00426-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-021-00426-7

Keywords

Navigation