当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites
arXiv - CS - Computational Geometry Pub Date : 2020-03-18 , DOI: arxiv-2003.08149
Arnab Bhadra and Kalidas Y

Protein-ligand interactions are one of the fundamental types of molecular interactions in living systems. Ligands are small molecules that interact with protein molecules at specific regions on their surfaces called binding sites. Tasks such as assessment of protein functional similarity and detection of side effects of drugs need identification of similar binding sites of disparate proteins across diverse pathways. Machine learning methods for similarity assessment require feature descriptors of binding sites. Traditional methods based on hand engineered motifs and atomic configurations are not scalable across several thousands of sites. In this regard, deep neural network algorithms are now deployed which can capture very complex input feature space. However, one fundamental challenge in applying deep learning to structures of binding sites is the input representation and the reference frame. We report here a novel algorithm Site2Vec that derives reference frame invariant vector embedding of a protein-ligand binding site. The method is based on pairwise distances between representative points and chemical compositions in terms of constituent amino acids of a site. The vector embedding serves as a locality sensitive hash function for proximity queries and determining similar sites. The method has been the top performer with more than 95% quality scores in extensive benchmarking studies carried over 10 datasets and against 23 other site comparison methods. The algorithm serves for high throughput processing and has been evaluated for stability with respect to reference frame shifts, coordinate perturbations and residue mutations. We provide Site2Vec as a stand alone executable and a web service hosted at \url{http://services.iittp.ac.in/bioinfo/home}.

中文翻译:

Site2Vec:一种用于蛋白质配体结合位点向量嵌入的参考框架不变算法

蛋白质-配体相互作用是生命系统中分子相互作用的基本类型之一。配体是小分子,在其表面称为结合位点的特定区域与蛋白质分子相互作用。诸如评估蛋白质功能相似性和检测药物副作用等任务需要识别不同途径中不同蛋白质的相似结合位点。用于相似性评估的机器学习方法需要结合位点的特征描述符。基于手工设计的图案和原子配置的传统方法无法在数千个站点中进行扩展。在这方面,现在部署了可以捕获非常复杂的输入特征空间的深度神经网络算法。然而,将深度学习应用于结合位点结构的一项基本挑战是输入表示和参考框架。我们在这里报告了一种新的算法 Site2Vec,它可以导出蛋白质配体结合位点的参考框架不变向量嵌入。该方法基于代表点和化学成分之间的成对距离,就位点的组成氨基酸而言。矢量嵌入用作邻近查询和确定相似站点的局部敏感哈希函数。在对 10 多个数据集进行的广泛基准研究和 23 种其他站点比较方法中,该方法以超过 95% 的质量得分表现最佳。该算法用于高吞吐量处理,并已针对参考帧偏移的稳定性进行了评估,协调扰动和残基突变。我们提供 Site2Vec 作为独立的可执行文件和托管在 \url{http://services.iittp.ac.in/bioinfo/home} 的网络服务。
更新日期:2020-08-11
down
wechat
bug