当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2021-07-29 , DOI: 10.1093/bib/bbab336
Jian Zhang 1 , Sina Ghadermarzi 2 , Akila Katuwawala 3 , Lukasz Kurgan 2
Affiliation  

Efforts to elucidate protein–DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie’s outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie’s webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.

中文翻译:

DNAgenie:准确预测蛋白质序列中 DNA 类型特异性结合残基

在分子水平上阐明蛋白质-DNA 相互作用的努力部分依赖于对蛋白质序列中 DNA 结合残基的准确预测。虽然有十几个 DNA 结合残基的计算预测因子,但它们与 DNA 类型无关,并且显着交叉预测与其他配体相互作用作为 DNA 结合的残基。我们利用定制设计的机器学习架构引入 DNAgenie,这是首个与 A-DNA、B-DNA 和单链 DNA 相互作用的残基预测器。DNAgenie 使用从输入蛋白质序列中提取的综合理化特征,并实施两步细化过程,以提供准确的预测并最大限度地减少交叉预测。对独立测试数据集的比较测试表明,DNAgenie 优于我们目前用于预测与三种 DNA 类型的残留水平相互作用的方法。进一步的分析发现,使用第二个(细化)步骤会导致交叉预测显着减少。经验测试表明,DNAgenie 的输出转换为粗粒度的蛋白质水平预测,与最近预测哪些 DNA 结合蛋白与双链和单链 DNA 相互作用的工具相比具有优势。此外,对整个人类蛋白质组序列的预测表明,DNAgenie 产生的结果与已知的 DNA 结合蛋白基本重叠,同时还包括数百个以前未知的假定 DNA 结合物的有希望的线索。这些结果表明,DNAgenie 是基于序列表征蛋白质功能的有价值的工具。DNAgenie 的网络服务器位于 http://biomine.cs.vcu.edu/servers/DNAgenie/。
更新日期:2021-07-29
down
wechat
bug