当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
Computational and Structural Biotechnology Journal ( IF 4.4 ) Pub Date : 2021-08-31 , DOI: 10.1016/j.csbj.2021.08.044
Wangren Qiu 1 , Zhe Lv 1 , Xuan Xiao 1 , Shuai Shao 1 , Hao Lin 2
Affiliation  

G Protein-Coupled Receptors (GPCRs) are one of the largest membrane protein receptor family in human, which are also important targets for many drugs. Thence, it’s of great significance to judge whether a protein is a GPCR or not. However, identifying GPCRs by experimental methods is very expensive and time-consuming. As more and more GPCR primary sequences are accumulated, it’s feasible to develop a computational model to predict GPCRs precisely and quickly. In this paper, a novel method called EMCBOW-GPCR has been proposed to improve the accuracy of identifying GPCRs based on natural language processing (NLP). For representing GPCRs, three word-embedding models and a bag-of-words model are used to extract original features. Then, the original features are thrown into a Deep-learning algorithm to extract features further and reduce the dimension. Finally, the obtained features are fed into Extreme Gradient Boosting. As shown with the results comparison, the overall prediction metrics of EMCBOW-GPCR are higher than the state of the arts. In order to be convenient for more researchers to use EMCBOW-GPCR, the method and source code have been opened in github, which are available at , and a user-friendly web-server for EMCBOW-GPCR has been established at .

中文翻译:


EMCBOW-GPCR:一种基于词嵌入和词书识别G蛋白偶联受体的方法



G蛋白偶联受体(GPCR)是人类最大的膜蛋白受体家族之一,也是许多药物的重要靶点。因此,判断一个蛋白是否是GPCR具有重要意义。然而,通过实验方法鉴定 GPCR 非常昂贵且耗时。随着越来越多的 GPCR 一级序列的积累,开发一种计算模型来精确、快速地预测 GPCR 是可行的。本文提出了一种名为 EMCBOW-GPCR 的新方法,以提高基于自然语言处理(NLP)识别 GPCR 的准确性。为了表示 GPCR,使用三个词嵌入模型和一个词袋模型来提取原始特征。然后,将原始特征放入深度学习算法中,进一步提取特征并降低维度。最后,将获得的特征输入到极限梯度提升中。结果比较表明,EMCBOW-GPCR 的总体预测指标高于现有技术。为了方便更多的研究人员使用EMCBOW-GPCR,该方法和源代码已在github上开放,可在http://github.com/EMCBOW-GPCR获取,并建立了一个用户友好的EMCBOW-GPCR网络服务器。
更新日期:2021-08-31
down
wechat
bug