Abstract
Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation amongst the human population and are key to personalized medicine. New tests are presented to distinguish pathogenic/malign (i.e., likely to contribute to or cause a disease) from nonpathogenic/benign SNPs, regardless of whether they occur in coding (exon) or noncoding (intron) regions in the human genome. The tests are based on the nearest neighbor (NN) model of Gibbs free energy landscapes of DNA hybridization and on deep structural properties of DNA revealed by an approximating metric (the h-distance) in DNA spaces of oligonucleotides of a common size. The quality assessments show that the newly defined PNPG test can classify a SNP with an accuracy about 73% for the required parameters. The best performance among machine learning models is a feed-forward neural network with fivefold cross-validation accuracy of at least 73%. These results may provide valuable tools to solve the SNP classification problem, where tools are lacking, to assess the likelihood of disease causing in unclassified SNPs. These tests highlight the significance of hybridization chemistry in SNPs. They can be applied to further the effectiveness of research in the areas of genomics and metabolomics.
Similar content being viewed by others
Code availability
MATLAB code used to produce the results and graphics is submitted as supplementary material.
References
Andronescu M, Aguirre-Hernandez R, Condon A, Hoos HH (2003) RNAsoft: a suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res 31(13):3416–3422
Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP (2007) Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23(13):i19–i28
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Yeh LSL (2004) UniProt: the universal protein knowledgebase. Nucl Acid Res 32:115–119
Cáceres M (2015) Structural variants, much ado about nothing? Brief Funct Genom 14:303–304
Altshuler David L. and 475 more (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073
Garzon MH, Bobba KC (2012) A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic D, Turberfield A (eds) DNA computing and molecular programming. Springer, Berlin, pp 73–85
Guo X (2015) Searching genome-wide disease association through SNP data. Dissertation, Georgia State University. https://scholarworks.gsu.edu/cs_diss/101
Hedrick PW (2011) Population genetics of malaria resistance in humans. Heredity 107(4):283–304
Kim S, Misra A (2007) SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng 9:289–320
Kitts A, Sherry S. (2002). The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation. The NCBI handbook. McEntyre J, Ostell J, eds. Bethesda, MD: US national center for biotechnology information
Mainali S, Garzon M, Colorado FA. Profiling environmental conditions from DNA. (2020). In: proceedings IWBBIO 2020-Work-conference on bioinformatics and biomedical engineering. I. Rojas et al. (eds.) Lecture notes in bioinformatics 12108, 647–658
Phan V, Garzon MH (2009) On codeword design in metric DNA spaces. Nat Comput 8(3):571
Reymond A, Friedli M, Henrichsen CN, Chapot F, Deutsch S, Ucla C, Antonarakis SE (2001) From PREDs and open reading frames to cDNA isolation: revisiting the human chromosome 21 transcription map. Genomics 78(1–2):46–54
Safa A, Omrani MD, Nicknafs F, Komaki A, Taheri M, Ghafouri-Fard S (2020) A single nucleotide polymorphism within molybdenum cofactor sulfurase gene is associated with neuropsychiatric conditions. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.540375
Schlötterer C (2004) The evolution of molecular markers—just a matter of fashion? Nat Rev Genet 5(1):63–69
Shah C (2020) A hands-on introduction to data science. Cambridge U press, Cambridge (ISBN: 978-1-108-47244-9)
Sherry ST, Ward M, Sirotkin K (1999) dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9(8):677–679
Sokolov BP (1990) Primer extension technique for the detection of single nucleotide in genomic DNA. Nucleic Acids Res 18(12):3671
Sun H, Yu G (2019) New insights into the pathogenicity of non-synonymous variants through multi-level analysis. Sci Rep 9(1):1–11
Wu X, Hurst LD (2016) Determinants of the usage of splice-associated cis-motifs predict the distribution of human pathogenic SNPs. Mol Biol Evol 33(2):518–529
Xu J, Murphy S L, Kochanek, KD. (2020). Mortality in the United States, 2018. NCHS data brief no. 355.
Acknowledgements
The authors acknowledge the use of the SNP databases dbSNP and humsavar made publicly available. The use of the HPC facilities at the U of Memphis is also gratefully acknowledged. We also thank anonymous reviewer(s) for comments that helped improve the content and presentation of this work.
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
SA located and collected the data, optimized the choice of parameters, ran the tests, and visualized the results. MG suggested the original idea for the project, did computations of the Gibbs energies and h-distances and coordinated the project overall. SM coordinated the writing and figures in the final version. All authors jointly analyzed the results and agreed with the final analyses in the paper.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Communicated by Stefan Hohmann.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Azizzadeh-Roodpish, S., Garzon, M. & Mainali, S. Classifying single nucleotide polymorphisms in humans. Mol Genet Genomics 296, 1161–1173 (2021). https://doi.org/10.1007/s00438-021-01805-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-021-01805-x