当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding
Nucleic Acids Research ( IF 14.9 ) Pub Date : 2017-11-20 , DOI: 10.1093/nar/gkx1145
Jinsen Li , Jared M. Sagendorf , Tsu-Pei Chiu , Marco Pasi , Alberto Perez , Remo Rohs

Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.

中文翻译:

扩大DNA形状特征库,用于转录因子结合的基因组规模研究

揭示影响转录因子(TFs)结合特异性的机制对于理解基因调控原理至关重要。尽管基于序列的模型已成功用于预测TF结合特异性,但我们发现在这些模型中包含DNA形状信息可提高其准确性和可解释性。以前,我们基于从蒙特卡洛(MC)仿真中提取的DNA形状特征,开发了一种用于建模DNA结合特异性的方法。但是,尚未将我们模型的预测准确性与包含从X射线晶体学(XRC)数据或分子动力学(MD)模拟中提取的DNA形状信息的模型的准确性进行比较。这里,我们将从MC或MD模拟和XRC数据中提取的DNA形状信息整合到TF结合的预测模型中,并比较了它们的性能。无论数据源如何,结合了结构信息的模型始终显示出比基于序列的模型更高的性能。此外,除了原始的四个特征集之外,我们还衍生并验证了另外九个DNA形状特征。我们的R / Bioconductor软件包DNAshapeR提供了13种不同的DNA形状特征的扩展库,其中包括6个碱基对对和6个碱基对对参数以及较小的凹槽宽度,可对基因组上的双螺旋进行全面的结构描述-大规模。无论数据源如何,结合了结构信息的模型始终显示出比基于序列的模型更高的性能。此外,除了原始的四个特征集之外,我们还衍生并验证了另外九个DNA形状特征。我们的R / Bioconductor软件包DNAshapeR提供了13种不同的DNA形状特征的扩展库,其中包括6个碱基对对和6个碱基对对参数以及较小的凹槽宽度,可对基因组上的双螺旋进行全面的结构描述-大规模。无论数据源如何,结合了结构信息的模型始终显示出比基于序列的模型更高的性能。此外,除了原始的四个特征集之外,我们还衍生并验证了另外九个DNA形状特征。我们的R / Bioconductor软件包DNAshapeR提供了13种不同的DNA形状特征的扩展库,其中包括6个碱基对和6个碱基对参数以及较小的凹槽宽度-大规模。
更新日期:2017-11-20
down
wechat
bug