当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
Database: The Journal of Biological Databases and Curation ( IF 5.8 ) Pub Date : 2020-11-20 , DOI: 10.1093/database/baaa087
Khodeza Begum 1, 2 , Jonathon E Mohl 2, 3, 4 , Fredrick Ayivor 1 , Eder E Perez 4 , Ming-Ying Leung 1, 2, 3, 4
Affiliation  

G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in various physiological processes such as vision, smell and inflammation, GPCRs are the targets of many prescription drugs. However, the functional and sequence diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. There are existing computational approaches, mainly using machine learning and statistical methods, to predict and classify GPCRs based on amino acid sequence and sequence derived features. In this paper, we describe a searchable MySQL database, named GPCR-PEnDB (GPCR Prediction Ensemble Database), of confirmed GPCRs and non-GPCRs. It was constructed with the goal of allowing users to conveniently access useful information of GPCRs in a wide range of organisms and to compile reliable training and testing datasets for different combinations of computational tools. This database currently contains 3129 confirmed GPCR and 3575 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, encompassing over 1200 species. The non-GPCR entries include transmembrane proteins for evaluating various prediction programs’ abilities to distinguish GPCRs from other transmembrane proteins. Each protein is linked to information about its source organism, classification, sequence lengths and composition, and other derived sequence features. We present examples of using this database along with its graphical user interface, to query for GPCRs with specific sequence properties and to compare the accuracies of five tools for GPCR prediction. This initial version of GPCR-PEnDB will provide a framework for future extensions to include additional sequence and feature data to facilitate the design and assessment of software tools and experimental studies to help understand the functional roles of GPCRs.

中文翻译:

GPCR-PEnDB:蛋白质序列和衍生特征的数据库,以促进 G 蛋白偶联受体的预测和分类

G 蛋白偶联受体 (GPCR) 构成了真核生物中最大的膜受体蛋白组。由于它们在视觉、嗅觉和炎症等各种生理过程中的重要作用,GPCR 是许多处方药的目标。然而,GPCR 的功能和序列多样性使它们基于氨基酸序列数据的预测和分类成为一个具有挑战性的生物信息学问题。现有的计算方法主要使用机器学习和统计方法,基于氨基酸序列和序列衍生特征对 GPCR 进行预测和分类。在本文中,我们描述了一个可搜索的 MySQL 数据库,名为 GPCR-PEnDB(GPCR 预测集合数据库),包含已确认的 GPCR 和非 GPCR。它的构建目标是允许用户方便地访问各种生物体中 GPCR 的有用信息,并为不同的计算工具组合编译可靠的训练和测试数据集。该数据库目前包含从 UniProtKB/Swiss-Prot 蛋白质数据库收集的 3129 个已确认的 GPCR 和 3575 个非 GPCR 序列,涵盖 1200 多个物种。非 GPCR 条目包括跨膜蛋白,用于评估各种预测程序区分 GPCR 与其他跨膜蛋白的能力。每个蛋白质都与有关其来源生物、分类、序列长度和组成以及其他衍生序列特征的信息相关联。我们展示了使用该数据库及其图形用户界面的示例,查询具有特定序列属性的 GPCR,并比较五种 GPCR 预测工具的准确性。GPCR-PEnDB 的初始版本将为未来的扩展提供一个框架,以包括额外的序列和特征数据,以促进软件工具和实验研究的设计和评估,以帮助理解 GPCR 的功能作用。
更新日期:2020-11-21
down
wechat
bug