当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S.
Journal of Computer-Aided Molecular Design ( IF 3.0 ) Pub Date : 2019-11-15 , DOI: 10.1007/s10822-019-00247-3
Yuwei Yang 1 , Jianing Lu 1 , Chao Yang 1 , Yingkai Zhang 1, 2
Affiliation  

Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall's τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall's τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.

中文翻译:

在组织蛋白酶S上使用机器学习探索基于片段的特定于目标的排序协议。

组织蛋白酶S(CatS)是半胱氨酸组织蛋白酶的成员,由于其在许多病理过程(包括关节炎,癌症和心血管疾病)中起着重要作用,因此已经得到了充分的研究。Cat3抑制剂已包括在D3R-GC3中,用于对接姿势预测和亲和力排名,D3R-GC4中包括用于结合亲和力排名。CatS抑制剂在D3R中带来的困难主要来自三个方面:大尺寸,高柔韧性和相似的化学结构。我们参加了GC4;我们提交的最佳模型采用了基于相似度的比对对接和Vina评分方案,对于GC4中的459种结合剂,Kendall的τ为0.23。在我们对机器学习的进一步探索中,通过策划CatS特定的培训集,采用基于相似度的约束对接方法以及基于局部敏感方式描述大型抑制剂的基于臂的片段化策略,我们最佳的基于结构的分级方案可以使GC4中所有结合剂的Kendallτ达到0.52。在这个探索过程中,我们已经证明了训练数据,对接方法和片段化策略在通过机器学习进行抑制剂排序协议开发中的重要性。
更新日期:2019-11-15
down
wechat
bug