当前位置: X-MOL 学术Methods Inf. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools.
Methods of Information in Medicine ( IF 1.7 ) Pub Date : 2020-08-10 , DOI: 10.1055/s-0040-1712460
Vicent Giménez-Alventosa 1 , José Damián Segrelles 1 , Germán Moltó 1 , Mar Roca-Sogorb 2
Affiliation  

Abstract

Background Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field.

Objectives The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers.

Methods This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users.

Results To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction.

Conclusion APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment.



Publication History

Received: 13 February 2020

Accepted: 09 April 2020

Publication Date:
10 August 2020 (online)

© 2020. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).

Georg Thieme Verlag KG
Stuttgart · New York



中文翻译:

APRICOT:通过开放工具在云中实现可复制基础设施的高级平台。

摘要

背景 科学出版物旨在在研究人员之间交流知识,但无法正确复制计算实验限制了科学研究的质量。此外,参考书目显示不可重复的临床前研究超过 50%,这对生命科学领域的非盈利性研究造成了巨大的资源浪费。因此,通过通常部署在现有计算资源上的开放数据库和软件工具,正在培养科学可重复性以促进开放科学。然而,一些计算实验需要复杂的虚拟基础设施,例如可以从多个云动态提供的弹性 PC 集群。获得这些基础设施不仅需要基础设施提供商,还需要云计算领域的先进知识。

目标 本文 的主要目标是提高生命科学的可重复性,以进行更好、更具成本效益的研究。为此,我们的目的是为研究人员简化基础设施的使用和部署。

方法 本文通过开放工具 (APRICOT) 介绍了云中可重复基础设施的高级平台,这是 Jupyter 的开源扩展,用于跨多云部署确定性虚拟基础设施以进行可重复的科学计算实验。为了举例说明它的使用以及 APRICOT 如何改进具有复杂计算要求的实验的再现,提供了生命科学领域的两个示例。重现这两个实验的所有要求都在 APRICOT 中公开,因此用户可以重现。

结果 为了展示 APRICOT 的功能,我们使用 APRICOT 自动部署的消息传递接口集群处理了真实的磁共振图像,以准确表征前列腺癌。此外,第二个示例展示了 APRICOT 如何根据工作负载使用批处理集群扩展部署的基础设施。此示例包含对正电子发射断层扫描图像重建的多参数研究。

结论 APRICOT 的好处是集成了特定的基础设施部署、Open Science 的管理和使用,使涉及特定计算基础设施的实验具有可重复性。所有实验步骤和细节都可以记录在同一个 Jupyter notebook 中,其中包括基础架构规范、数据存储、实验执行、结果收集和基础架构终止。因此,分发实验笔记本和所需的数据应该足以重现实验。



出版历史

收稿日期:2020 年 2 月 13 日

接受日期:2020 年4 月 9 日

出版日期:
2020 年 8 月 10 日(在线)

© 2020。作者。这是一篇由 Thieme 根据知识共享署名-非衍生-非商业-许可条款发布的开放获取文章,只要原创作品得到适当的认可,就允许复制和复制。内容不得用于商业目的,也不得改编、重新混合、转换或构建。(https://creativecommons.org/licenses/by-nc-nd/4.0/)。

Georg Thieme Verlag KG
斯图加特·纽约

更新日期:2020-08-11
down
wechat
bug