当前位置: X-MOL 学术Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides
Genomics ( IF 3.4 ) Pub Date : 2020-10-02 , DOI: 10.1016/j.ygeno.2020.09.065
Phasit Charoenkwan 1 , Sakawrat Kanthawong 2 , Chanin Nantasenamat 3 , Md Mehedi Hasan 4 , Watshara Shoombuatong 3
Affiliation  

Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10–22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9–13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.



中文翻译:

iAMY-SCM:使用具有二肽倾向评分的评分卡方法改进淀粉样蛋白的预测和分析

大规模快速、准确地鉴定和表征淀粉样蛋白对于了解它们在治疗干预策略中的作用至关重要。事实上,仅存在一种使用随机森林 (RF) 模型结合各种特征类型即 RFAmy 来识别淀粉样蛋白的计算机模型。然而,它对生物学家的解释性很低。因此,与现有的复杂模型相比,非常需要开发一种简单且易于解释的预测方法,该方法具有鲁棒的准确性。在这项研究中,我们提出了 iAMY-SCM,这是第一个基于记分卡方法的预测器,用于预测和分析淀粉样蛋白。在此处,iAMY-SCM 利用简单的加权求和函数与二肽的倾向得分相结合来识别淀粉样蛋白。交叉验证结果表明,iAMY-SCM 提供了 0.895 的准确度,相当于比广泛使用的机器学习模型高 10-22% 的性能。此外,经独立测试评估,iAMY-SCM 的准确度为 0.827,与 RFAmy 相当,比广泛使用的机器学习模型高出约 9-13%。此外,对氨基酸和二肽的估计倾向评分进行了分析,以深入了解淀粉样蛋白的生物物理和生化特性。因此,这表明所提出的 iAMY-SCM 在简单性方面是有效和可靠的,可解释性和实施。为了便于使用所提议的 iAMY-SCM,已在 http://camt.pythonanywhere.com/iAMY-SCM 建立了一个用户友好且可公开访问的网络服务器。我们预计 iAMY-SCM 将成为促进大规模预测和表征淀粉样蛋白的重要工具。

更新日期:2020-10-04
down
wechat
bug