当前位置: X-MOL 学术Cell Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mango: Exploratory Data Analysis for Large-Scale Sequencing Datasets.
Cell Systems ( IF 9.0 ) Pub Date : 2019-12-04 , DOI: 10.1016/j.cels.2019.11.002
Alyssa Kramer Morrow 1 , George Zhixuan He 2 , Frank Austin Nothaft 3 , Eric Tongching Tu 4 , Justin Paschall 1 , Nir Yosef 5 , Anthony Douglas Joseph 6
Affiliation  

The decreasing cost of DNA sequencing over the past decade has led to an explosion of sequencing datasets, leaving us with petabytes of data to analyze. However, current sequencing visualization tools are designed to run on single machines, which limits their scalability and interactivity on modern genomic datasets. Here, we leverage the scalability of Apache Spark to provide Mango, consisting of a Jupyter notebook and genome browser, which removes scalability and interactivity constraints by leveraging multi-node compute clusters to allow interactive analysis over terabytes of sequencing data. We demonstrate scalability of the Mango tools by performing quality control analyses on 10 terabytes of 100 high-coverage sequencing samples from the Simons Genome Diversity Project, enabling capability for interactive genomic exploration of multi-sample datasets that surpass the computational limitations of single-node visualization tools. Mango is freely available for download with full documentation at https://bdg-mango.readthedocs.io/en/latest/.



中文翻译:

芒果:大规模测序数据集的探索性数据分析。

在过去的十年中,DNA测序的成本不断下降,导致测序数据集的爆炸式增长,给我们留下了PB级的数据需要分析。但是,当前的测序可视化工具被设计为可以在单台机器上运行,这限制了它们在现代基因组数据集上的可扩展性和交互性。在这里,我们利用Apache Spark的可伸缩性来提供由Jupyter笔记本和基因组浏览器组成的Mango,它通过利用多节点计算集群来允许对数TB的测序数据进行交互式分析,从而消除了可伸缩性和交互性约束。我们通过对来自Simons基因组多样性项目的100 TB高覆盖率测序样品中的10 TB进行质量控制分析,证明了Mango工具的可扩展性,超越单节点可视化工具的计算限制的多样本数据集的交互式基因组探索能力。可在https://bdg-mango.readthedocs.io/en/latest/上免费下载Mango及其完整文档。

更新日期:2019-12-04
down
wechat
bug