当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Curation of over 10 000 transcriptomic studies to enable data reuse
Database: The Journal of Biological Databases and Curation ( IF 3.4 ) Pub Date : 2021-01-28 , DOI: 10.1093/database/baab006
Nathaniel Lim 1, 2 , Stepan Tesar 2 , Manuel Belmadani 2 , Guillaume Poirier-Morency 2 , Burak Ogan Mancarci 2, 3 , Jordan Sicherman 2, 3 , Matthew Jacobson 2 , Justin Leong 2 , Patrick Tan 2 , Paul Pavlidis 2, 4
Affiliation  

Vast amounts of transcriptomic data reside in public repositories, but effective reuse remains challenging. Issues include unstructured dataset metadata, inconsistent data processing and quality control, and inconsistent probe–gene mappings across microarray technologies. Thus, extensive curation and data reprocessing are necessary prior to any reuse. The Gemma bioinformatics system was created to help address these issues. Gemma consists of a database of curated transcriptomic datasets, analytical software, a web interface and web services. Here we present an update on Gemma’s holdings, data processing and analysis pipelines, our curation guidelines, and software features. As of June 2020, Gemma contains 10 811 manually curated datasets (primarily human, mouse and rat), over 395 000 samples and hundreds of curated transcriptomic platforms (both microarray and RNA sequencing). Dataset topics were represented with 10 215 distinct terms from 12 ontologies, for a total of 54 316 topic annotations (mean topics/dataset = 5.2). While Gemma has broad coverage of conditions and tissues, it captures a large majority of available brain-related datasets, accounting for 34% of its holdings. Users can access the curated data and differential expression analyses through the Gemma website, RESTful service and an R package. Database URL: https://gemma.msl.ubc.ca/home.html

中文翻译:

管理超过 10,000 项转录组学研究以实现数据重用

大量转录组数据驻留在公共存储库中,但有效重用仍然具有挑战性。问题包括非结构化数据集元数据、不一致的数据处理和质量控制,以及跨微阵列技术的不一致探针基因映射。因此,在任何重用之前,必须进行广泛的管理和数据再处理。Gemma 生物信息学系统旨在帮助解决这些问题。Gemma 由精选的转录组数据集数据库、分析软件、网络界面和网络服务组成。在这里,我们介绍了 Gemma 的资产、数据处理和分析管道、我们的管理指南和软件功能的更新。截至 2020 年 6 月,Gemma 包含 10811 个手动整理的数据集(主要是人类、小鼠和大鼠),超过 395,000 个样本和数百个精选的转录组学平台(微阵列和 RNA 测序)。数据集主题用来自 12 个本体的 10215 个不同的术语表示,总共有 54316 个主题注释(平均主题/数据集 = 5.2)。虽然 Gemma 涵盖了广泛的条件和组织,但它捕获了大部分可用的大脑相关数据集,占其持有量的 34%。用户可以通过 Gemma 网站、RESTful 服务和 R 包访问精选数据和差异表达分析。数据库网址:https://gemma.msl.ubc.ca/home.html 它捕获了大部分可用的与大脑相关的数据集,占其持有量的 34%。用户可以通过 Gemma 网站、RESTful 服务和 R 包访问精选数据和差异表达分析。数据库网址:https://gemma.msl.ubc.ca/home.html 它捕获了大部分可用的与大脑相关的数据集,占其持有量的 34%。用户可以通过 Gemma 网站、RESTful 服务和 R 包访问精选数据和差异表达分析。数据库网址:https://gemma.msl.ubc.ca/home.html
更新日期:2021-01-28
down
wechat
bug