当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-03-14 , DOI: 10.1186/s13321-024-00821-4
Anna Carbery , Martin Buttenschoen , Rachael Skyner , Frank von Delft , Charlotte M. Deane

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

中文翻译:

学习到的蛋白质表示可用于准确预测实验确定和预测的蛋白质结构上的小分子结合位点

蛋白质-配体结合位点预测是了解感兴趣的新型蛋白质的功能行为和潜在药物-靶标相互作用的有用工具。然而,大多数结合位点预测方法都是通过提供结晶配体结合(全息)结构作为输入来测试的。这种测试机制不足以了解无法获得实验结构的新型蛋白质靶标的性能。另一种选择是提供计算预测的蛋白质结构,但这通常不经过测试。然而,由于使用的训练数据,计算预测的蛋白质结构往往非常准确,并且通常偏向于全息构象。在本研究中,我们描述并基准测试了 IF-SitePred,这是一种蛋白质-配体结合位点预测方法,该方法基于 ESM-IF1 蛋白质语言模型嵌入的标记结合点云注释和聚类。我们表明,在预测实验结构上的结合位点时,IF-SitePred 不仅可以与最先进的方法相媲美,而且它在分子动力学模拟的低准确度的新型蛋白质代理上表现更好。最后,如果生成预测蛋白质结构的集合,IF-SitePred 的性能优于其他方法。
更新日期:2024-03-15
down
wechat
bug