Skip to main content
Log in

Frequency spectra characterization of noncoding human genomic sequences

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Background

Noncoding sequences have been demonstrated to possess regulatory functions. Its classification is challenging because they do not show well-defined nucleotide patterns that can correlate with their biological functions. Genomic signal processing techniques like Fourier transform have been employed to characterize coding and noncoding sequences. This transformation in a systematic whole-genome noncoding library, such as the ENCODE database, can provide evidence of a periodic behaviour in the noncoding sequences that correlates with their regulatory functions.

Objective

The objective of this study was to classify different noncoding regulatory regions through their frequency spectra.

Methods

We computed machine learning algorithms to classify the noncoding regulatory sequences frequency spectra.

Results

The sequences from different regulatory regions, cell lines, and chromosomes possessed distinct frequency spectra, and that machine learning classifiers (such as those of the support vector machine type) could successfully discriminate among regulatory regions, thus correlating the frequency spectra with their biological functions

Conclusion

Our work supports the idea that there are patterns in the noncoding sequences of the genome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Afreixo V, Ferreira PJSG, Santos D (2004) Fourier analysis of symbolic data: a brief review. Digit Signal Process 14(6):523–530

    Google Scholar 

  • Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB (2010) Annotating non-coding regions of the genome. Nat Rev Genet 11(8):559–571

    CAS  PubMed  Google Scholar 

  • Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18(4):8–20

    Google Scholar 

  • Beisel C, Paro R (2011) Silencing chromatin: comparing modes and mechanisms. Nat Rev Genet 12(2):123–135

    CAS  PubMed  Google Scholar 

  • Bell AC, West AG, Felsenfeld G (2001) Insulators and boundaries: versatile regulatory elements in the eukaryotic genome. Science 291(5503):447–450

    CAS  PubMed  Google Scholar 

  • Benson DC (1990) Fourier methods for biosequence analysis. Nucleic Acids Res 18(21):6305–6310

    CAS  PubMed  PubMed Central  Google Scholar 

  • Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA (2014) Genomic signal processing methods for computation of alignment-free distances from DNA sequences. PLoS One 9(11):e110954

    PubMed  PubMed Central  Google Scholar 

  • Brodu N (2009) A synthesis and a practical approach to complex systems. Complexity 15(1):36–60

    Google Scholar 

  • Cinelli M, Sun Y, Best K, Heather JM, Reich-Zeliger S, Shifrut E, Friedman N, Shawe-Taylor J, Chain B (2017) Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires. Bioinformatics 33(7):951–955

    CAS  PubMed  PubMed Central  Google Scholar 

  • Datta S, Asif A (2005) A fast DFT based gene prediction algorithm for identification of protein coding regions. In: Proceedings of the ICASSP, pp 113–116

  • Ernst J (2012) Mapping enhancer and promoter interactions. Cell Res 22(5):789–790

    CAS  PubMed  PubMed Central  Google Scholar 

  • Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M et al (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473(7345):43–49

    CAS  PubMed  PubMed Central  Google Scholar 

  • Fickett JW (1982) Recognition of protein coding regions in DNA sequences. Nucleic Acids Res 10(17):5303–5318

    CAS  PubMed  PubMed Central  Google Scholar 

  • Frank E, Hall MA, Witten IH (2016) Data mining: practical machine learning tools and techniques. Kauffman, Burlington

    Google Scholar 

  • Fürnkranz J, Gamberger D, Lavrač N (2012) Foundations of rule learning. Springer, New York

    Google Scholar 

  • Gisiger T (2001) Scale invariance in biology: coincidence or footprint of a universal mechanism? Biol Rev Camb Philos Soc 76(2):161–209

    CAS  PubMed  Google Scholar 

  • Grewal SIS, Jia S (2007) Heterochromatin revisited. Nat Rev Genet 8(1):35–46

    CAS  PubMed  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington

    Google Scholar 

  • Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Vàrnai C, Thiecke MJ, Burden F et al (2016) Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167(5):1369–1384

    CAS  PubMed  PubMed Central  Google Scholar 

  • Jishnu S, Gopinath DP (2009) Wavelet analysis of coding and noncoding regions of DNA sequences. In: Proceedings of the NCTT09, pp 6–7

  • Levo M, Segal E (2014) In pursuit of design principles of regulatory sequences. Nat Rev Genet 15(7):453–468

    CAS  PubMed  Google Scholar 

  • Li W, Holste D (2005) Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine–cytosine content in DNA sequences of the human genome. Phys Rev E Stat Nonlinear Soft Matter Phys 71(4):041910

    Google Scholar 

  • Li M, Liu G-H, Belmonte JCI (2012) Navigating the epigenetic landscape of pluripotent stem cells. Nat Rev Mol Cell Biol 13(8):524–535

    CAS  PubMed  Google Scholar 

  • Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Morales JA (2017) On DNA numerical representations for genomic similarity computation. PLoS One 12(3):e0173288

    PubMed  PubMed Central  Google Scholar 

  • Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Vélez-Pérez H, Morales JA (2018) Genomic signal processing for DNA sequence clustering. PeerJ 6:e4264

    PubMed  PubMed Central  Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw Hill, New York

    Google Scholar 

  • Ong CT, Corces VG (2011) Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet 12(4):283–293

    CAS  PubMed  PubMed Central  Google Scholar 

  • Ong C-T, Corces VG (2012) Enhancers: emerging roles in cell fate specification. EMBO Rep 13(5):423–430

    CAS  PubMed  PubMed Central  Google Scholar 

  • Paredes O, Romo-Vázquez R, Vélez-Pérez H, Morales JA (2017) Análisis estadístico de los espectros de frecuencia de las regiones reguladoras del ENCODE. Rev Mex Ing Bio 38(3):637–345

    Google Scholar 

  • Pennisi E (2012) ENCODE project writes eulogy for Junk DNA. Science 337(6099):1159–1161

    CAS  PubMed  Google Scholar 

  • Phillips-Cremins J, Corces V (2013) Chromatin insulators: linking genome organization to cellular function. Mol Cell 50(4):461–474

    CAS  PubMed  PubMed Central  Google Scholar 

  • Richards EJ, Elgin SCR (2002) Epigenetic codes for heterochromatin formation and silencing: rounding up the usual suspects. Cell 108(4):489–500

    CAS  PubMed  Google Scholar 

  • Riethoven JJM (2010) Regulatory regions in DNA: promoters, enhancers, silencers, and insulators. In: Ladunga I (ed) Computational biology of transcription factor binding, vol 674. Humana Press, Totowa

    Google Scholar 

  • Rubin AJ, Barajas BC, Furlan-Magaril M, Lopez-Pajares V, Mumbach MR, Howard I, Kim DS, Boxer LD, Cairns J, Spivakov M et al (2017) Lineage-specific dynamic and pre-established enhancer-promoter contacts cooperate in terminal differentiation. Nat Genet 49(10):1522–1528

    CAS  PubMed  PubMed Central  Google Scholar 

  • Schietgat L, Vens C, Struyf J, Blockeel H, Kocev D, Džeroski S (2010) Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinform 11(1):2

    Google Scholar 

  • Sheffield N, Furey T (2012) Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays. Genes 3(4):651–670

    PubMed  PubMed Central  Google Scholar 

  • The ENCODE Project Consortium (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9(4):e1001046

    PubMed Central  Google Scholar 

  • The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74

    PubMed Central  Google Scholar 

  • Wallace JA, Felsenfeld G (2007) We gather together: insulators and genome organization. Curr Opin Genet Dev 17(5):400–407

    CAS  PubMed  PubMed Central  Google Scholar 

  • Weingarten-Gabbay S, Segal E (2014) A shared architecture for promoters and enhancers. Nat Genet 46(12):1253–1254

    CAS  PubMed  Google Scholar 

  • West AG, Gaszner M, Felsenfeld G (2002) Insulators: many functions, many mechanisms. Genes Dev 16(3):271–288

    PubMed  Google Scholar 

  • Whalen S, Truty RM, Pollard KS (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48(5):488–496

    CAS  PubMed  PubMed Central  Google Scholar 

  • Yin C, Yau SS-T (2005) A Fourier characteristic of coding sequences: origins and a non-Fourier approximation. J Comput Biol 12(9):1153–1165

    CAS  PubMed  Google Scholar 

  • Yin C, Yau SS-T (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247(4):687–694

    CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Alejandro Morales.

Ethics declarations

Conflict of interest

The authors, OP, RRV, IRG, HVP, RASR and JAM declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paredes, O., Romo-Vázquez, R., Román-Godínez, I. et al. Frequency spectra characterization of noncoding human genomic sequences. Genes Genom 42, 1215–1226 (2020). https://doi.org/10.1007/s13258-020-00980-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-020-00980-2

Keywords

Navigation