当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Findable and reusable workflow data products: A genomic workflow case study
Semantic Web ( IF 3.0 ) Pub Date : 2020-05-12 , DOI: 10.3233/sw-200374
Alban Gaignard 1 , Hala Skaf-Molli 2 , Khalid Belhajjame 3
Affiliation  

While workflow systems have improved the repeatability of scientific experiments, the value of the processed (intermediate) data have been overlooked so far. In this paper, we argue that the intermediate data products of workflow executions should be seen as first-class objects that need to be curated and published. Not only will this be exploited to save time and resources needed when re-executing workflows, but more importantly, it will improve the reuse of data products by the same or peer scientists in the context of new hypotheses and experiments. To assist curator in annotating (intermediate) workflow data, we exploit in this work multiple sources of information, namely: (i) the provenance information captured by the workflow system, and (ii) domain annotations that are provided by tools registries, such as Bio.Tools. Furthermore, we show, on a concrete bioinformatics scenario, how summarising techniques can be used to reduce the machine-generated provenance information of such data products into concise human- and machine-readable annotations.

中文翻译:

可查找和可重用的工作流程数据产品:基因组工作流程案例研究

尽管工作流系统提高了科学实验的可重复性,但是到目前为止,已忽略了已处理(中间)数据的价值。在本文中,我们认为工作流执行的中间数据产品应被视为需要策划和发布的一流对象。不仅可以利用它来节省重新执行工作流时所需的时间和资源,而且更重要的是,它将在新的假设和实验的背景下,提高相同或同等科学家对数据产品的重用。为了帮助策展人注释(中间)工作流数据,我们在这项工作中利用了多种信息源,即:(i)工作流系统捕获的出处信息,以及(ii)工具注册中心提供的域注释,例如生物工具。此外,我们展示了
更新日期:2020-06-30
down
wechat
bug