当前位置: X-MOL 学术J. Biosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PRIGSA2: Improved version of protein repeat identification by graph spectral analysis
Journal of Biosciences ( IF 2.1 ) Pub Date : 2020-07-02 , DOI: 10.1007/s12038-020-00058-x
Broto Chakrabarty , Nita Parekh

Tandemly repeated structural motifs in proteins form highly stable structural folds and provide multiple binding sites associated with diverse functional roles. The tertiary structure and function of these proteins are determined by the type and copy number of the repeating units. Each repeat type exhibits a unique pattern of intra- and inter-repeat unit interactions that is well-captured by the topological features in the network representation of protein structures. Here we present an improved version of our graph based algorithm, PRIGSA, with structure-based validation and filtering steps incorporated for accurate detection of tandem structural repeats. The algorithm integrates available knowledge on repeat families with de novo prediction to detect repeats in single monomer chains as well as in multimeric protein complexes. Three levels of performance evaluation are presented: comparison with state-of-the-art algorithms on benchmark dataset of repeat and non-repeat proteins, accuracy in the detection of members of 13 known repeat families reported in UniProt and execution on the complete Protein Data Bank to show its ability to identify previously uncharacterized proteins. A ~3-fold increase in the coverage of the members of 13 known families and 3408 novel uncharacterized structural repeat proteins are identified on executing it on PDB. PRIGSA2 is available at http://bioinf.iiit.ac.in/PRIGSA2/ .

中文翻译:

PRIGSA2:通过图谱分析进行蛋白质重复识别的改进版本

蛋白质中串联重复的结构基序形成高度稳定的结构折叠,并提供与不同功能作用相关的多个结合位点。这些蛋白质的三级结构和功能由重复单元的类型和拷贝数决定。每种重复类型都表现出独特的重复单元内和重复单元间相互作用模式,可以通过蛋白质结构网络表示中的拓扑特征很好地捕捉到这种模式。在这里,我们提出了基于图形的算法 PRIGSA 的改进版本,其中包含基于结构的验证和过滤步骤,用于准确检测串联结构重复。该算法将重复家族的现有知识与从头预测相结合,以检测单个单体链以及多聚体蛋白质复合物中的重复。提出了三个级别的性能评估:在重复和非重复蛋白质的基准数据集上与最先进的算法进行比较,检测 UniProt 中报告的 13 个已知重复家族成员的准确性以及在完整蛋白质数据上的执行银行展示其识别以前未表征的蛋白质的能力。在 PDB 上执行它时,确定了 13 个已知家族成员和 3408 个新的未表征结构重复蛋白的覆盖范围增加了约 3 倍。PRIGSA2 可在 http://bioinf.iiit.ac.in/PRIGSA2/ 获得。检测 UniProt 中报告的 13 个已知重复家族成员的准确性,并在完整的蛋白质数据库上执行,以显示其识别以前未表征蛋白质的能力。在 PDB 上执行它时,确定了 13 个已知家族成员和 3408 个新的未表征结构重复蛋白的覆盖范围增加了约 3 倍。PRIGSA2 可在 http://bioinf.iiit.ac.in/PRIGSA2/ 获得。检测 UniProt 中报告的 13 个已知重复家族成员的准确性,并在完整的蛋白质数据库上执行,以显示其识别以前未表征蛋白质的能力。在 PDB 上执行它时,确定了 13 个已知家族成员和 3408 个新的未表征结构重复蛋白的覆盖范围增加了约 3 倍。PRIGSA2 可在 http://bioinf.iiit.ac.in/PRIGSA2/ 获得。
更新日期:2020-07-02
down
wechat
bug