当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Consistency and variation of protein subcellular location annotations.
Proteins: Structure, Function, and Bioinformatics ( IF 2.9 ) Pub Date : 2020-09-16 , DOI: 10.1002/prot.26010
Ying-Ying Xu 1, 2, 3 , Hang Zhou 2 , Robert F Murphy 3 , Hong-Bin Shen 2
Affiliation  

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human‐interpreted rather than primary data. For example, the Swiss‐Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high‐resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss‐Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss‐Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.

中文翻译:

蛋白质亚细胞定位注释的一致性和变异性。

蛋白质数据库的一个主要挑战是协调来自不同来源的信息。当某些信息由人工解释的次要数据而非原始数据组成时,这尤其困难。例如,Swiss‐Prot 数据库包含基于蛋白质序列预测、科学文章中的陈述和已发表的实验证据的亚细胞定位注释。人类蛋白质图谱 (HPA) 由数百万张高分辨率显微图像组成,这些图像显示了蛋白质在细胞和亚细胞水平上的空间分布。这些图像由训练有素的专家用蛋白质亚细胞位置手动注释。HPA 中的图像注释可以捕获不同细胞系、组织或组织状态中亚细胞位置的变化。以前没有描述过 HPA 和 Swiss-Prot 亚细胞定位分配之间一致性的系统研究,这对于理解和利用两个数据库中的蛋白质定位数据很重要。在本文中,我们定量评估了 HPA 和 Swiss-Prot 之间亚细胞定位注释在多个水平上的一致性,以及跨细胞系和组织的蛋白质定位变化。我们的结果表明,这两个数据库的注释在许多情况下存在显着差异,导致提出了用于推导和整合蛋白质亚细胞定位数据的程序。我们还发现具有高度可变位置的蛋白质更有可能成为疾病的生物标志物,
更新日期:2020-09-16
down
wechat
bug