Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines.,Proteomics

当前位置： X-MOL 学术 › Proteomics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines.
Proteomics ( IF 3.4 ) Pub Date : 2019-12-18 , DOI: 10.1002/pmic.201900147
Yasset Perez-Riverol ₁ , Pablo Moreno ₁

Affiliation

The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single-tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large-scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large-scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.

中文翻译：

使用 BioContainers 和工作流引擎进行蛋白质组学和代谢组学中的可扩展数据分析。

质谱仪器和新分析方法的最新改进正在增加蛋白质组学和大数据科学之间的交叉点。此外，生物信息学分析变得越来越复杂和复杂，涉及多种算法和工具。近年来，为计算蛋白质组学和代谢组学开发了各种各样的方法和软件工具，而且这种趋势很可能会持续下去。然而，大多数计算蛋白质组学和代谢组学工具都设计为单层软件应用程序，其中分析任务无法分布，限制了数据分析的可扩展性和可重复性。本文总结了代谢组学和蛋白质组学数据处理的关键步骤，包括用于执行数据分析的主要工具和软件。讨论了用于大规模代谢组学和蛋白质组学分析的软件容器与工作流环境的结合。最后，一种基于 BioContainers 和两个最流行的工作流环境 Galaxy 和 Nextflow 的可重复和大规模数据分析的新方法被引入蛋白质组学和代谢组学社区。

更新日期：2019-12-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>