Skip to main content
Log in

Classifying single nucleotide polymorphisms in humans

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation amongst the human population and are key to personalized medicine. New tests are presented to distinguish pathogenic/malign (i.e., likely to contribute to or cause a disease) from nonpathogenic/benign SNPs, regardless of whether they occur in coding (exon) or noncoding (intron) regions in the human genome. The tests are based on the nearest neighbor (NN) model of Gibbs free energy landscapes of DNA hybridization and on deep structural properties of DNA revealed by an approximating metric (the h-distance) in DNA spaces of oligonucleotides of a common size. The quality assessments show that the newly defined PNPG test can classify a SNP with an accuracy about 73% for the required parameters. The best performance among machine learning models is a feed-forward neural network with fivefold cross-validation accuracy of at least 73%. These results may provide valuable tools to solve the SNP classification problem, where tools are lacking, to assess the likelihood of disease causing in unclassified SNPs. These tests highlight the significance of hybridization chemistry in SNPs. They can be applied to further the effectiveness of research in the areas of genomics and metabolomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and material

The data used in this study are publicly available from the databases dbSNP (Sherry et al. 1999) and humsavar (Apweiler et al. 2004) with the IDs given in the tables in ″Appendix″.

Code availability

MATLAB code used to produce the results and graphics is submitted as supplementary material.

References

  • Andronescu M, Aguirre-Hernandez R, Condon A, Hoos HH (2003) RNAsoft: a suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res 31(13):3416–3422

    Article  CAS  Google Scholar 

  • Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP (2007) Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23(13):i19–i28

    Article  CAS  Google Scholar 

  • Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Yeh LSL (2004) UniProt: the universal protein knowledgebase. Nucl Acid Res 32:115–119

    Article  CAS  Google Scholar 

  • Cáceres M (2015) Structural variants, much ado about nothing? Brief Funct Genom 14:303–304

    Article  Google Scholar 

  • Altshuler David L. and 475 more (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073

    Article  CAS  Google Scholar 

  • Garzon MH, Bobba KC (2012) A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic D, Turberfield A (eds) DNA computing and molecular programming. Springer, Berlin, pp 73–85

    Chapter  Google Scholar 

  • Guo X (2015) Searching genome-wide disease association through SNP data. Dissertation, Georgia State University. https://scholarworks.gsu.edu/cs_diss/101

  • Hedrick PW (2011) Population genetics of malaria resistance in humans. Heredity 107(4):283–304

  • Kim S, Misra A (2007) SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng 9:289–320

    Article  CAS  Google Scholar 

  • Kitts A, Sherry S. (2002). The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation. The NCBI handbook. McEntyre J, Ostell J, eds. Bethesda, MD: US national center for biotechnology information

  • Mainali S, Garzon M, Colorado FA. Profiling environmental conditions from DNA. (2020). In: proceedings IWBBIO 2020-Work-conference on bioinformatics and biomedical engineering. I. Rojas et al. (eds.) Lecture notes in bioinformatics 12108, 647–658

  • Phan V, Garzon MH (2009) On codeword design in metric DNA spaces. Nat Comput 8(3):571

    Article  CAS  Google Scholar 

  • Reymond A, Friedli M, Henrichsen CN, Chapot F, Deutsch S, Ucla C, Antonarakis SE (2001) From PREDs and open reading frames to cDNA isolation: revisiting the human chromosome 21 transcription map. Genomics 78(1–2):46–54

    Article  CAS  Google Scholar 

  • Safa A, Omrani MD, Nicknafs F, Komaki A, Taheri M, Ghafouri-Fard S (2020) A single nucleotide polymorphism within molybdenum cofactor sulfurase gene is associated with neuropsychiatric conditions. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.540375

    Article  PubMed  PubMed Central  Google Scholar 

  • Schlötterer C (2004) The evolution of molecular markers—just a matter of fashion? Nat Rev Genet 5(1):63–69

    Article  CAS  Google Scholar 

  • Shah C (2020) A hands-on introduction to data science. Cambridge U press, Cambridge (ISBN: 978-1-108-47244-9)

    Book  Google Scholar 

  • Sherry ST, Ward M, Sirotkin K (1999) dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9(8):677–679

    PubMed  CAS  Google Scholar 

  • Sokolov BP (1990) Primer extension technique for the detection of single nucleotide in genomic DNA. Nucleic Acids Res 18(12):3671

    Article  CAS  Google Scholar 

  • Sun H, Yu G (2019) New insights into the pathogenicity of non-synonymous variants through multi-level analysis. Sci Rep 9(1):1–11

    Google Scholar 

  • Wu X, Hurst LD (2016) Determinants of the usage of splice-associated cis-motifs predict the distribution of human pathogenic SNPs. Mol Biol Evol 33(2):518–529

    Article  CAS  Google Scholar 

  • Xu J, Murphy S L, Kochanek, KD. (2020). Mortality in the United States, 2018. NCHS data brief no. 355.

Download references

Acknowledgements

The authors acknowledge the use of the SNP databases dbSNP and humsavar made publicly available. The use of the HPC facilities at the U of Memphis is also gratefully acknowledged. We also thank anonymous reviewer(s) for comments that helped improve the content and presentation of this work.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

SA located and collected the data, optimized the choice of parameters, ran the tests, and visualized the results. MG suggested the original idea for the project, did computations of the Gibbs energies and h-distances and coordinated the project overall. SM coordinated the writing and figures in the final version. All authors jointly analyzed the results and agreed with the final analyses in the paper.

Corresponding author

Correspondence to Max H Garzon.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Communicated by Stefan Hohmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 7, 8, 9.

Table 7 A sample of 38 pathogenic SNPs ranked by high severity (Kitts and Sherry 2002; Andronescu et al. 2003, 2007)
Table 8 A sample of 62 coding SNPs (unranked), selected evenly over their range of occurrence (Kitts and Sherry 2002; Andronescu et al. 2003)
Table 9 A random sample of 100 noncoding SNPs (Kitts and Sherry 2002; Andronescu et al. 2003)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azizzadeh-Roodpish, S., Garzon, M. & Mainali, S. Classifying single nucleotide polymorphisms in humans. Mol Genet Genomics 296, 1161–1173 (2021). https://doi.org/10.1007/s00438-021-01805-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-021-01805-x

Keywords

Navigation