当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prot2HG: a database of protein domains mapped to the human genome.
Database: The Journal of Biological Databases and Curation ( IF 5.8 ) Pub Date : 2020-01-01 , DOI: 10.1093/database/baz161
David Stanek 1 , Dana M Bis-Brewer 2 , Cima Saghira 2 , Matt C Danzi 2 , Pavel Seeman 1 , Petra Lassuthova 1 , Stephan Zuchner 2
Affiliation  

Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com.

中文翻译:

Prot2HG:映射到人类基因组的蛋白质结构域数据库。

在疾病病因背景下检查DNA变异时,在保守的功能蛋白结构域内发生的遗传变异值得特别关注。在这里,我们介绍了一种资源,该资源可从www.prot2hg.com免费获得,该资源解决了特定变体是否落在带注释的蛋白质结构域上并将染色体坐标直接转化为蛋白质残基的问题。该工具可以以一种简单的方式执行多站点查询,整个数据集都可以下载,也可以整合到我们自己的可访问管道中。为了创建此资源,使用Entrez编程实用程序检索了国家生物技术信息中心蛋白质数据。处理所有人类蛋白质结构域后,残基位置被反向翻译并定位到参考基因组hg19,并存储在MySQL数据库中。总共将来自42371个蛋白质模型的760487个蛋白质域映射到hg19坐标,并公开提供给搜索或下载(www.prot2hg.com)。此外,该注释已在基因组学研究平台GENESIS中实施,以查询患有孟德尔疾病的罕见家庭的近8000个外显子组和基因组(tgp-foundation.org)。当将其应用于患者的遗传数据时,我们发现与普通(> 1%)变体相比,基因组聚合数据库中的稀有(<1%)变体在蛋白质域上的注释显着更多。同样,在ClinVar中被描述为致病性或可能致病性的变体更可能被注释到域上。此外,我们在一组患有癫痫性脑病的患者中测试了由60个因果变体组成的数据集,发现其中71%(43个变体)传播到蛋白质结构域上。总之,我们开发了一种资源,可将基因组编码部分的变异体注释到保守的蛋白结构域上,以提高变异体的优先排序效率。数据库URL:www.prot2hg.com。
更新日期:2020-04-17
down
wechat
bug