当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.
GigaScience ( IF 11.8 ) Pub Date : 2017-06-01 , DOI: 10.1093/gigascience/gix029
Tazro Ohta , Takeru Nakazato , Hidemasa Bono

It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party.

中文翻译:

计算公共高通量测序数据的质量,以从序列读取档案中获得合适的子集进行重新分析。

对于公共数据存储库而言,促进已归档数据的重用非常重要。但是,在不断增长的组学科学领域中,向公共存储库提交高通量测序(HTSeq)数据的次数越来越多,这使用户无法从大量搜索结果中选择合适的数据集。存储库用户需要能够设置阈值以减少结果数量,以获得合适的高质量数据子集进行重新分析。我们使用质量控制软件FastQC计算了在公共数据存储库中的测序数据的质量,即序列读取档案(SRA)。我们获得了1 171 313个实验的质量值,可用于评估数据可重复使用的适用性。我们还通过整合质量信息以及实验和样品的元数据来可视化SRA中的数据分布。我们提供所有已存档测序数据的质量信息,使用户能够获取足够质量的测序数据以进行重新分析。计算出的质量数据以各种格式向公众公开。我们的数据还提供了一个示例,通过第三方将元数据添加到已发布的研究数据中来增强公共数据的重用性。
更新日期:2020-04-17
down
wechat
bug