当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PM4NGS, a project management framework for next-generation sequencing data analysis
GigaScience ( IF 11.8 ) Pub Date : 2021-01-13 , DOI: 10.1093/gigascience/giaa141
Roberto Vera Alvarez 1 , Lorinc Pongor 2 , Leonardo Mariño-Ramírez 3 , David Landsman 1
Affiliation  

Background FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. Findings Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. Conclusions PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.

中文翻译:


PM4NGS,下一代测序数据分析的项目管理框架



背景 FAIR(可查找性、可访问性、互操作性和可重用性)下一代测序 (NGS) 数据分析依赖复杂的计算生物学工作流程和管道来保证可重复性、可移植性和可扩展性。此外,工作流语言、管理器和容器技术帮助解决了以可扩展方式跨多个平台执行数据分析管道的问题。研究结果在这里,我们提出了一个名为 PM4NGS 的 NGS 数据分析项目管理框架。该框架由自动创建目录和文件的标准组织结构、使用 Docker 或 Bioconda 的生物信息学工具管理以及 CWL 格式的数据分析管道组成。 PM4NGS 中包含具有最少 Python 代码的预配置 Jupyter 笔记本,用于生成项目报告和可供发布的数据。我们展示了 3 个用于演示目的的流程,包括 RNA-Seq、ChIP-Seq 和 ChIP-exo 数据集的分析。结论 PM4NGS 是一个开源框架,可为 NGS 数据分析项目创建标准组织结构。非生物信息学家可以在个人电脑和笔记本电脑上轻松安装、配置和使用 PM4NGS。它允许在 Windows 10 上执行 NGS 数据分析,并激活 Windows Subsystem for Linux 功能。该框架旨在缩小生成 NGS 数据的实验实验室研究人员与数据分析工作流程之间的差距。 PM4NGS 文档可通过 https://pm4ngs.readthedocs.io/ 访问。
更新日期:2021-01-13
down
wechat
bug