当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Balancing the needs of consumers and producers for scientific data collections
Ecological Informatics ( IF 5.8 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.ecoinf.2021.101251
Deborah Agarwal , Joan Damerow , Charuleka Varadharajan , Danielle Christianson , Gilberto Pastorello , You-Wei Cheah , Lavanya Ramakrishnan

Recent emphasis and requirements for open data publication have led to significant increases in data availability in the Earth sciences, which is critical to long-tail data integration. Currently, data are often published in a repository with an identifier and citation, similar to those for papers. Subsequent publications that use the data are expected to provide a citation in the reference section of the paper. However, the format of the data citation is still evolving, particularly with regards to citing dynamic data, subsets, and collections of data. Considering the motivations of both data producers and consumers, the most pressing need is to create user-friendly solutions that provide credit for data producers and enable accurate citation of data, particularly integrated data. Providing easy-to-use data citations is a critical foundation that is required to address the socio-technical challenges around data integration. Studies that integrate data from dozens or hundreds of datasets must often include data citations in supplementary material due to page limits. However, citations in the supplementary material are not indexed, making it difficult to track citations and thus giving credit to the data producer. In this paper, we discuss our experiences and the challenges we have encountered with current citation guidance. We also review the relative merits of the currently available mechanisms designed to enable compact citation of collections of data, such as data collections, data papers, and dynamic data citations. We consider these options for three data producer scenarios: a domain-specific data collection, a data repository, and a large-scale, multidisciplinary project. We posit that a new mechanism is also needed to enable citation of multiple datasets and credit to data producers.



中文翻译:

平衡消费者和生产者对科学数据收集的需求

开放数据公开的最新重点和要求已导致地球科学中数据可用性的显着增加,这对于长尾数据集成至关重要。当前,数据通常在带有标识符和引文的存储库中发布,类似于论文的标识符和引文。期望使用该数据的后续出版物在该论文的参考部分提供引用。但是,数据引用的格式仍在发展,特别是在引用动态数据,子集和数据集合方面。考虑到数据生产者和消费者的动机,最迫切的需求是创建用户友好的解决方案,以为数据生产者提供信誉并实现对数据(尤其是集成数据)的准确引用。提供易于使用的数据引用是解决围绕数据集成的社会技术挑战所必需的关键基础。由于页数的限制,整合来自数十个或数百个数据集的数据的研究通常必须在补充材料中包含数据引用。但是,补充材料中的引文未编入索引,因此很难跟踪引文,因此可归功于数据生成者。在本文中,我们将讨论我们的经验以及当前引用指南中所遇到的挑战。我们还回顾了旨在紧凑引用数据集合(例如数据集合,数据文件和动态数据引用)的当前可用机制的相对优点。我们针对以下三种数据生产者方案考虑了这些选项:特定于域的数据收集,一个数据存储库,以及一个大型,多学科的项目。我们认为,还需要一种新的机制来启用多个数据集并归功于数据生产者。

更新日期:2021-03-10
down
wechat
bug