当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CATH: increased structural coverage of functional space
Nucleic Acids Research ( IF 16.6 ) Pub Date : 2020-11-25 , DOI: 10.1093/nar/gkaa1079
Ian Sillitoe 1 , Nicola Bordin 1 , Natalie Dawson 1 , Vaishali P Waman 1 , Paul Ashford 1 , Harry M Scholes 1 , Camilla S M Pang 1 , Laurel Woodridge 1 , Clemens Rauer 1 , Neeladri Sen 1 , Mahnaz Abbasian 1 , Sean Le Cornu 1 , Su Datt Lam 2 , Karel Berka 3 , Ivana Hutařová Varekova 4 , Radka Svobodova 5 , Jon Lees 6 , Christine A Orengo 1
Affiliation  

Abstract
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


中文翻译:


CATH:增加功能空间的结构覆盖


 抽象的

CATH (https://www.cathdb.info) 从 wwPDB 中识别蛋白质结构中的域,并将其分类为进化超家族,从而提供结构和功能注释。有两个级别:CATH-B,最新结构域结构和超家族分配的每日快照,以及 CATH+,带有额外的衍生数据,例如预测的序列域和功能一致的序列子集(功能族或 FunFams)。最新的 CATH+ 版本 4.3 显着提高了结构和序列数据的覆盖范围,增加了 65,351 个完全分类的域结构 (+15%),提供 500,238 个结构域和 1.51 亿个预测序列域 (+59%)分配给 5481 个超家族。 FunFam 生成管道经过重新设计,以应对不断增加的数据涌入。 FunFams 捕获的序列数量增加了三倍,功能纯度、信息内容和结构覆盖率也随之增加。 FunFam 扩展增加了为实验 GO 术语提供的结构注释 (+59%)。我们还展示了 CATH-FunVar 网页,显示蛋白质序列的变化及其与已知或预测功能位点的接近程度。我们提出了两个案例研究 (1) 假定的癌症驱动因素和 (2) SARS-CoV-2 蛋白。最后,我们改进了与 CATH 的链接,包括 SCOP、InterPro、Aquaria 和 2DProt。
更新日期:2021-01-03
down
wechat
bug