当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward reliable biodiversity dataset references
Ecological Informatics ( IF 5.1 ) Pub Date : 2020-06-27 , DOI: 10.1016/j.ecoinf.2020.101132
Michael J. Elliott , Jorrit H. Poelen , José A.B. Fortes

No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.

From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable. We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.



中文翻译:

建立可靠的生物多样性数据集参考

尚未采用系统方法来可靠地引用数字生物多样性数据集并提供对之的访问。基于积累的证据,我们认为基于位置的标识符(例如URL)不足以确保长期数据访问。我们介绍一种使用专用数据观测站评估长期URL可靠性的方法。

从2019年3月到2020年5月,我们通过访问基于URL的数据集引用(收集者从中检索数据),定期收集了提供给主要生物多样性聚合者(包括GBIF,iDigBio,DataONE和BHL)的数据。在观察期内,我们发现,对于每个聚合商的数据提供商注册表中可用的基于URL的数据集引用,有5%至70%的URL间歇性或始终无响应,0%至66%的URL产生不稳定的内容,而20%至75%则变得反应迟钝或不稳定。我们建议使用加密哈希来生成可以可靠地引用数据集的基于内容的标识符。

更新日期:2020-06-27
down
wechat
bug