当前位置: X-MOL 学术Hum. Mol. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep structure of DNA for genomic analysis
Human Molecular Genetics ( IF 3.5 ) Pub Date : 2021-09-10 , DOI: 10.1093/hmg/ddab272
Max Garzon 1 , Sambriddhi Mainali 1
Affiliation  

Recent advances in next-generation sequencing, deep networks and other bioinformatic tools have enabled us to mine huge amount of genomic information about living organisms in the post-microarray era. However, these tools do not explicitly factor in the role of the underlying DNA biochemistry (particularly, DNA hybridization) essential to life processes. Here, we focus more precisely on the role that DNA hybridization plays in determining properties of biological organisms at the macro-level. We illustrate its role with solutions to challenging problems in human disease. These solutions are made possible by novel structural properties of DNA hybridization landscapes revealed by a metric model of oligonucleotides of a common length that makes them reminiscent of some planets in our solar system, particularly Earth and Saturn. They allow a judicious selection of so-called noncrosshybridizing (nxh) bases that offer substantial reduction of DNA sequences of arbitrary length into a few informative features. The quality assessment of the information extracted by them is high because of their very low Shannon Entropy, i.e. they minimize the degree of uncertainty in hybridization that makes results on standard microarrays irreproducible. For example, SNP classification (pathogenic/non-pathogenic) and pathogen identification can be solved with high sensitivity (~77%/100%) and specificity (~92%/100%, respectively) for combined taxa on a sample of over 264 fully coding sequences in whole bacterial genomes and fungal mitochondrial genomes using machine learning (ML) models. These methods can be applied to several other interesting research questions that could be addressed with similar genomic analyses.

中文翻译:

用于基因组分析的 DNA 深层结构

新一代测序、深度网络和其他生物信息学工具的最新进展使我们能够在后微阵列时代挖掘大量关于活生物体的基因组信息。然而,这些工具并没有明确地考虑到对生命过程至关重要的潜在 DNA 生物化学(特别是 DNA 杂交)的作用。在这里,我们更准确地关注 DNA 杂交在确定宏观水平的生物有机体特性方面所起的作用。我们通过解决人类疾病中具有挑战性的问题来说明它的作用。这些解决方案是通过 DNA 杂交景观的新结构特性实现的,该特性由具有共同长度的寡核苷酸的度量模型揭示,使它们让人想起我们太阳系中的一些行星,特别是地球和土星。它们允许明智地选择所谓的非交叉杂交 (nxh) 碱基,这些碱基可将任意长度的 DNA 序列大量减少为一些信息特征。他们提取的信息的质量评估很高,因为它们的香农熵非常低,即它们最大限度地减少了杂交的不确定性,这使得标准微阵列上的结果无法重现。例如,SNP 分类(致病性/非致病性)和病原体鉴定可以对超过 264 个样本的组合分类群以高灵敏度(~77%/100%)和特异性(分别~92%/100%)解决使用机器学习 (ML) 模型在整个细菌基因组和真菌线粒体基因组中完全编码序列。
更新日期:2021-09-10
down
wechat
bug