当前位置: X-MOL 学术Biomolecules › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
Biomolecules ( IF 5.5 ) Pub Date : 2021-09-09 , DOI: 10.3390/biom11091337
Ruiyang Song 1 , Baixin Cao 1 , Zhenling Peng 2 , Christopher J Oldfield 3 , Lukasz Kurgan 3 , Ka-Chun Wong 4 , Jianyi Yang 1
Affiliation  

Non-synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving room for improvements. We propose a new sequence-based predictor, DMBS, which addresses the need to improve the predictive quality. The design of DMBS relies on the observation that the deleterious mutations are likely to occur at the highly conserved and functionally important positions in the protein sequence. Correspondingly, we introduce two innovative components. First, we improve the estimates of the conservation computed from the multiple sequence profiles based on two complementary databases and two complementary alignment algorithms. Second, we utilize putative annotations of functional/binding residues produced by two state-of-the-art sequence-based methods. These inputs are processed by a random forests model that provides favorable predictive performance when empirically compared against five other machine-learning algorithms. Empirical results on four benchmark datasets reveal that DMBS achieves AUC > 0.94, outperforming current methods, including protein structure-based approaches. In particular, DMBS secures AUC = 0.97 for the SNPdbe and ExoVar datasets, compared to AUC = 0.70 and 0.88, respectively, that were obtained by the best available methods. Further tests on the independent HumVar dataset shows that our method significantly outperforms the state-of-the-art method SNPdryad. We conclude that DMBS provides accurate predictions that can effectively guide wet-lab experiments in a high-throughput manner.

中文翻译:

基于序列的准确预测具有多个序列谱和推定结合残基的有害 nsSNP

非同义单核苷酸多态性 (nsSNP) 可能导致与人类疾病相关的致病变化。对这些有害 nsSNP 的准确预测需求量很大。有害 nsSNP 的现有预测因子确保了适度的预测性能水平,为改进留下了空间。我们提出了一种新的基于序列的预测器 DMBS,它解决了提高预测质量的需求。DMBS 的设计依赖于观察到有害突变可能发生在蛋白质序列中高度保守和功能重要的位置。相应地,我们引入了两个创新组件。首先,我们基于两个互补的数据库和两个互补的比对算法改进了从多个序列图谱计算的守恒估计。第二,我们利用两种最先进的基于序列的方法产生的功能/结合残基的假定注释。这些输入由随机森林模型处理,当与其他五种机器学习算法进行经验比较时,该模型提供了良好的预测性能。四个基准数据集的实证结果表明,DMBS 的 AUC > 0.94,优于当前的方法,包括基于蛋白质结构的方法。特别是,DMBS 确保 SNPdbe 和 ExoVar 数据集的 AUC = 0.97,而通过最佳可用方法获得的 AUC 分别为 0.70 和 0.88。对独立 HumVar 数据集的进一步测试表明,我们的方法明显优于最先进的方法 SNPdryad。
更新日期:2021-09-10
down
wechat
bug