The reuse of public datasets in the life sciences: potential risks and rewards,PeerJ

当前位置： X-MOL 学术 › PeerJ › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The reuse of public datasets in the life sciences: potential risks and rewards
PeerJ ( IF 2.3 ) Pub Date : 2020-09-22 , DOI: 10.7717/peerj.9954
Katharina Sielemann _{1,

2} , Alenka Hafner _{1,

3} , Boas Pucker _{1,

4}

Affiliation

The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.

中文翻译：

生命科学中公共数据集的重用：潜在风险和回报

在数据集的公共共享和重用的推动下，“大数据”革命催生了生命科学领域的新型分析。在这里，我们回顾了重用公开可用数据集的巨大潜力以及相关的挑战、限制和风险。还讨论了问题的可能解决方案和研究完整性考虑因素。由于测序数据的重要性、丰富性和广泛分布，我们专注于公开可用的序列数据集的重用。我们将“成功重用”定义为利用先前发布的数据来实现新的科学发现。通过使用来自不同学科的成功重用的精选示例，我们说明了该实践的巨大潜力，同时承认各自的局限性和风险。还提供了用于确定特定数据集的重用价值和潜力的清单。对数据重用的公开讨论以及将这种实践建立为规范有可能使生命科学领域的所有利益相关者受益。

更新日期：2020-09-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11