Skip to main content
Log in

GSCNN: a composition of CNN and Gibb Sampling computational strategy for predicting promoter in bacterial genomes

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

This paper is about a new computational strategy to find the promoter sequence which is a transcription starting factor in DNA without which the gene sequence cannot be activated for further protein synthesis process. Genes are the hereditary source of organisms’ functionality of living things. Sequence of activities is carried out to reveal the protein production process hidden in the genes. In this regard promoter prediction is essential to bring out correctness of protein synthesis in decoding the genetical information. Also it enables the biologists to find the genetical malfunctions. Nowadays numerous computational methods which have been proposed are helpful to the biologist in related tasks. Earlier works applied Gibb sampling method, Markov Chain Model and machine learning strategies to predict the promoters. In this paper a new computational strategy named as GSCNN is introduced which combines Gibb sampling with Convolution Neural Network (CNN). This model uses Gibb sampling method to select the best profile features of DNA and extract the scores as feature vector with the help of position frequency matrix. Those extracted feature vectors are further fed into Convolution Neural Network to predict the sigma (σ) 54 promoter of bacterial genome. The performance of the proposed system is evaluated with the help of Bacillus subtilis NC_000964 dataset. The sigma (σ) 54 promoter prediction using the proposed GSCNN method achieved significant improvement in performance metrics compared with the traditional machine learning algorithms such as Decision Tree (DT), K-Nearest Neighbor(KNN), Random Forest(RF) and Support Vector Machine(SVM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Shahmuradov IA, Razali RM, Bougouffa S, Radovanovic A, Bajic VB (2017) bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 33(3):334–340. https://doi.org/10.1093/bioinformatics/btw629

    Article  Google Scholar 

  2. Lin H, Deng EZ, Ding H, Chen W, Chou KC (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 42(21):12961–129672. https://doi.org/10.1093/nar/gku1019 ((ISBN: 0305-1048))

    Article  Google Scholar 

  3. Barrios H, Valderrama B, Morett E (1999) Compilation and analysis of σ54 dependent promoter sequences. Nucleic Acids Res 27(22):4305–4313. https://doi.org/10.1093/nar/27.22.4305

    Article  Google Scholar 

  4. Mallios RR, Ojcius DM, Ardel DH (2009) An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis σ66 promoters. BMC Bioinform 10:27. https://doi.org/10.1186/1471-2105-10-271(ISBN:1471-2105)

    Article  Google Scholar 

  5. Shida K (2006) GibbST: a Gibb sampling method for motif discovery with enhanced resistance to local optima. BMC Bioinform 7:486. https://doi.org/10.1186/1471-2105-7-486

    Article  Google Scholar 

  6. Zhou T, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ, Gordân R, Rohs R (2015) Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci 112(15):4654–4659. https://doi.org/10.1073/pnas.1422023112 ((ISBN: 0027-8424))

    Article  Google Scholar 

  7. Premalatha C, Aravindan C, Kannan K (2011) Promoter prediction in eukaryotes using soft computing techniques. IEEE Conf Recent Adv Intell Comput Syst. https://doi.org/10.1109/RAICS.2011.6069368ISBN:978-1-4244-9478-1

    Article  Google Scholar 

  8. He W, Jia C, Duan Y, Zou Q (2017) 70ProPred: a predictor for discovering σ70 promoters based on combining multiple features. In: 11th International Conference on Systems Biology (ISB 2017) Shenzhen, China, https://doi.org/10.1007/978-3-319-68759-9 (ISBN: 978–3–319–68758–2 ISBN: 0302–9743)

  9. Li Q-Z, Lin H (2006) The recognition and prediction of sigma70 promoters in Escherichia coli K-12. J Theor Biol 242:135–141. https://doi.org/10.1016/j.jtbi.2006.02.007ISBN:0022-5193

    Article  MATH  Google Scholar 

  10. Gan Y, Guan J, Zhou S (2012) A comparison study on feature selection of DNA structural properties for promoter prediction. BMC Bioinform 13:4. https://doi.org/10.1186/1471-2105-13-4ISBN:1471-2105

    Article  Google Scholar 

  11. Gusmao EG, Dieterich C, Zenke M, Costa IG (2014) Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications. Bioinformatics 30(22):3143–3151. https://doi.org/10.1093/bioinformatics/btu519 ((ISBN: 1367-4803))

    Article  Google Scholar 

  12. Bland C, Newsome AS, Markovets AA (2010) Promoter prediction in E. coli based on SIDD profiles and artificial neural networks. In: 7th Annual MCBIOS Conference Bioinformatics: Systems, Biology, Informatics and Computation Jonesboro, AR, USA ; February 2010, https://doi.org/10.1186/1471-2105-11-S6-S17

  13. Liu D, Xiang X, DasGupta B, Zhang H (2006) Motif Discoveries in Unaligned Molecular Sequences Using Self-Organizing Neural Networks. IEEE Trans Neural Netw. https://doi.org/10.1109/TNN.2006.875987 ((ISBN: 1045-9227))

    Article  Google Scholar 

  14. Abbass MM, Bahig HM (2013) An efficient algorithm to identify DNA motifs. Math Compu 7:387–399. https://doi.org/10.1007/s11786-013-0165-6

    Article  MathSciNet  MATH  Google Scholar 

  15. Makolo AU, Lamidi UA (2018) Motif discovery in DNA sequences using improved gibbs (i Gibbs) sampling algorithm. J Comput Sci Syst Biol 11:5. https://doi.org/10.4172/jcsb.1000288

    Article  Google Scholar 

  16. Frith MC, Hansen U, Spouge JL, Weng Z (2004) Finding functional sequence elements by multiple local alignment. Nuclic Acids Res 32(1):189–200. https://doi.org/10.1093/nar/gkh169

    Article  Google Scholar 

  17. Kilpatrick AM, Ward B, Aitken S (2013) MCOIN: a novel heuristic for determining transcription factor binding site motif width. Algorithm Mol Biol 8:16 http://www.almob.org/content/8/1/16; https://doi.org/10.1186/1748-7188-8-16 (ISBN: 1748–7188)

  18. Liu B, Han L, Liu X, Wu J, Ma Q (2018) Computational prediction of sigma-54 promoters in bacterial genomes by integrating motif finding and machine learning strategies. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2816032 ((ISBN: 1545-5963))

    Article  Google Scholar 

  19. Meng G, Mosig A, Vingron M (2010) A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes. BMC Bioinform 11:267. https://doi.org/10.1186/1471-2105-11-267 ((ISBN: 1471-2105))

    Article  Google Scholar 

  20. Lin H, Li Q-Z (2011) Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 130:91–100. https://doi.org/10.1007/s12064-010-0114-8 ((ISBN: 1367-4803))

    Article  Google Scholar 

  21. Di Salvo M, Pinatel E, Tala A, Fondi M, Peano C, Alifano P (2018) G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs. BMC Bioinform 19:36. https://doi.org/10.1186/s12859-018-2049-xISBN:1471-2105

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sasikala S..

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sasikala, S., Ratha Jeyalakshmi, T. GSCNN: a composition of CNN and Gibb Sampling computational strategy for predicting promoter in bacterial genomes. Int. j. inf. tecnol. 13, 493–499 (2021). https://doi.org/10.1007/s41870-020-00565-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00565-y

Keywords

Navigation