当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2022-03-15 , DOI: 10.1186/s13321-022-00591-x
Junjie Wang 1 , NaiFeng Wen 2 , Chunyu Wang 3 , Lingling Zhao 3 , Liang Cheng 4, 5
Affiliation  

Drug-target binding affinity (DTA) reflects the strength of the drug-target interaction; therefore, predicting the DTA can considerably benefit drug discovery by narrowing the search space and pruning drug-target (DT) pairs with low binding affinity scores. Representation learning using deep neural networks has achieved promising performance compared with traditional machine learning methods; hence, extensive research efforts have been made in learning the feature representation of proteins and compounds. However, such feature representation learning relies on a large-scale labelled dataset, which is not always available. We present an end-to-end deep learning framework, ELECTRA-DTA, to predict the binding affinity of drug-target pairs. This framework incorporates an unsupervised learning mechanism to train two ELECTRA-based contextual embedding models, one for protein amino acids and the other for compound SMILES string encoding. In addition, ELECTRA-DTA leverages a squeeze-and-excitation (SE) convolutional neural network block stacked over three fully connected layers to further capture the sequential and spatial features of the protein sequence and SMILES for the DTA regression task. Experimental evaluations show that ELECTRA-DTA outperforms various state-of-the-art DTA prediction models, especially with the challenging, interaction-sparse BindingDB dataset. In target selection and drug repurposing for COVID-19, ELECTRA-DTA also offers competitive performance, suggesting its potential in speeding drug discovery and generalizability for other compound- or protein-related computational tasks.

中文翻译:

ELECTRA-DTA:一种基于上下文序列编码的新化合物-蛋白质结合亲和力预测模型

药物-靶点结合亲和力(DTA)反映了药物-靶点相互作用的强度;因此,通过缩小搜索空间和修剪具有低结合亲和力分数的药物-靶标 (DT) 对,预测 DTA 可以大大有利于药物发现。与传统的机器学习方法相比,使用深度神经网络的表示学习取得了可喜的性能;因此,在学习蛋白质和化合物的特征表示方面进行了广泛的研究。然而,这种特征表示学习依赖于大规模的标记数据集,这并不总是可用的。我们提出了一个端到端的深度学习框架 ELECTRA-DTA,以预测药物-靶标对的结合亲和力。该框架结合了一种无监督学习机制来训练两个基于 ELECTRA 的上下文嵌入模型,一个用于蛋白质氨基酸,另一个用于复合 SMILES 字符串编码。此外,ELECTRA-DTA 利用堆叠在三个全连接层上的挤压和激发 (SE) 卷积神经网络块来进一步捕获蛋白质序列的顺序和空间特征以及用于 DTA 回归任务的 SMILES。实验评估表明,ELECTRA-DTA 优于各种最先进的 DTA 预测模型,尤其是在具有挑战性的、交互稀疏的 BindingDB 数据集方面。在 COVID-19 的目标选择和药物再利用方面,ELECTRA-DTA 还提供了具有竞争力的性能,
更新日期:2022-03-15
down
wechat
bug