ecocomDP: A flexible data design pattern for ecological community survey data,Ecological Informatics

当前位置： X-MOL 学术 › Ecol. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ecocomDP: A flexible data design pattern for ecological community survey data
Ecological Informatics ( IF 5.8 ) Pub Date : 2021-07-23 , DOI: 10.1016/j.ecoinf.2021.101374
Margaret O'Brien ₁ , Colin A. Smith ₂ , Eric R. Sokol _{3,

4} , Corinna Gries ₂ , Nina Lany ₅ , Sydne Record ₆ , Max C.N. Castorani ₇

Affiliation

The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards - both locally and globally - have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of long-term ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data ‘silo’ but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data's discovery and use.

中文翻译：

ecocomDP：生态社区调查数据的灵活数据设计模式

协调数据的想法并不新鲜。数十年来，根据社区标准（包括本地和全球）在数据库中收集数据，对于某些研究领域而言比其他领域更为成功。在采样协议差异很大并且需要了解复杂的环境条件以正确应用分析方法的研究之间协调数据尤其困难。然而，大量的长期生态群落观测越来越公开，并已被用于重要的研究。在这里，我们与国家天文台合作，讨论了通过环境数据存储库准备协调社区调查数据的方法。工作流框架和存储库基础设施用于创建分散的、异步模型可在不通过清理或聚合更改原始数据的情况下重新格式化数据，同时保留有关采样方法和来源的元数据，并支持程序化数据访问。这种方法不会创建另一个数据“孤岛”，而是允许存储库将可用数据的子集贡献给各种不同的分析就绪数据准备工作。由于某些限制（例如，随着时间的推移改变采样协议），数据更新和下游处理可以完全自动化。除了支持通过综合科学重复使用社区观察数据外，这种协调和工作流程工作的目标是将这些数据集贡献给全球生物多样性信息设施 (GBIF)，以增加数据的发现和使用。同时保留有关采样方法和来源的元数据，并支持程序化数据访问。这种方法不会创建另一个数据“孤岛”，而是允许存储库将可用数据的子集贡献给各种不同的分析就绪数据准备工作。由于某些限制（例如，随着时间的推移改变采样协议），数据更新和下游处理可以完全自动化。除了支持通过综合科学重复使用社区观察数据外，这种协调和工作流程工作的目标是将这些数据集贡献给全球生物多样性信息设施 (GBIF)，以增加数据的发现和使用。同时保留有关采样方法和来源的元数据，并支持程序化数据访问。这种方法不会创建另一个数据“孤岛”，而是允许存储库将可用数据的子集贡献给各种不同的分析就绪数据准备工作。由于某些限制（例如，随着时间的推移改变采样协议），数据更新和下游处理可以完全自动化。除了支持通过综合科学重复使用社区观察数据外，这种协调和工作流程工作的目标是将这些数据集贡献给全球生物多样性信息设施 (GBIF)，以增加数据的发现和使用。这种方法不会创建另一个数据“孤岛”，而是允许存储库将可用数据的子集贡献给各种不同的分析就绪数据准备工作。由于某些限制（例如，随着时间的推移改变采样协议），数据更新和下游处理可以完全自动化。除了支持通过综合科学重复使用社区观察数据外，这种协调和工作流程工作的目标是将这些数据集贡献给全球生物多样性信息设施 (GBIF)，以增加数据的发现和使用。这种方法不会创建另一个数据“孤岛”，而是允许存储库将可用数据的子集贡献给各种不同的分析就绪数据准备工作。由于某些限制（例如，随着时间的推移改变采样协议），数据更新和下游处理可以完全自动化。除了支持通过综合科学重复使用社区观察数据外，这种协调和工作流程工作的目标是将这些数据集贡献给全球生物多样性信息设施 (GBIF)，以增加数据的发现和使用。

更新日期：2021-08-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11