当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Site2Vec: a reference frame invariant algorithm for vector embedding of protein–ligand binding sites
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2020-12-08 , DOI: 10.1088/2632-2153/abad88
Arnab Bhadra , Kalidas Yeturu

Protein–ligand interactions are one of the fundamental types of molecular interactions in living systems. Ligands are small molecules that interact with protein molecules at specific regions on their surfaces called binding sites. Binding sites would also determine ADMET properties of a drug molecule. Tasks such as assessment of protein functional similarity and detection of side effects of drugs need identification of similar binding sites of disparate proteins across diverse pathways. To this end, methods for computing similarities between binding sites are still evolving and is an active area of research even today. Machine learning methods for similarity assessment require feature descriptors of binding sites. Traditional methods based on hand engineered motifs and atomic configurations are not scalable across several thousands of sites. In this regard, deep neural network algorithms are now deployed which can capture very complex input feature space. However, one fundamental challenge in applying deep learning to structures of binding sites is the input representation and the reference frame. We report here a novel algorithm, Site2Vec, that derives reference frame invariant vector embedding of a protein–ligand binding site. The method is based on pairwise distances between representative points and chemical compositions in terms of constituent amino acids of a site. The vector embedding serves as a locality sensitive hash function for proximity queries and determining similar sites. The method has been the top performer with more than 95% quality scores in extensive benchmarking studies carried over 10 data sets and against 23 other site comparison methods in the field. The algorithm serves for high throughput processing and has been evaluated for stability with respect to reference frame shifts, coordinate perturbations and residue mutations. We also provide the method as a standalone executable and a web service hosted at (http://services.iittp.ac.in/bioinfo/home).



中文翻译:

Site2Vec:用于蛋白质-配体结合位点的载体嵌入的参考框架不变算法

蛋白质-配体相互作用是生命系统中分子相互作用的基本类型之一。配体是与蛋白质分子在其表面上称为结合位点的特定区域相互作用的小分子。结合位点还将决定药物分子的ADMET特性。诸如评估蛋白质功能相似性和检测药物副作用之类的任务需要跨多种途径鉴定完全不同的蛋白质的相似结合位点。为此,计算结合位点之间相似性的方法仍在发展,并且即使在今天也是活跃的研究领域。用于相似性评估的机器学习方法需要绑定位点的特征描述符。基于手工设计的主题和原子构型的传统方法无法在数千个站点上扩展。在这方面,现在部署了深度神经网络算法,该算法可以捕获非常复杂的输入特征空间。但是,将深度学习应用于绑定位点结构的一项基本挑战是输入表示形式和参考框架。我们在这里报告了一种新颖的算法Site2Vec,该算法派生了蛋白质-配体结合位点的参考框架不变载体嵌入。该方法基于代表位点和化学组成之间的成对距离(根据位点的组成氨基酸)。向量嵌入用作邻近查询和确定相似站点的局部敏感哈希函数。在广泛的基准研究中,该方法在10多个数据集和本领域的其他23种站点比较方法中,均获得了95%以上的最高质量评分,是性能最高的方法。该算法用于高通量处理,并已针对参考帧移位,坐标扰动和残基突变进行了稳定性评估。我们还提供了作为独立可执行文件和Web服务的方法,该方法位于(http://services.iittp.ac.in/bioinfo/home)。

更新日期:2020-12-08
down
wechat
bug