当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches
Ecological Informatics ( IF 5.1 ) Pub Date : 2021-02-06 , DOI: 10.1016/j.ecoinf.2021.101245
Robert Huber , Claudio D'Onofrio , Anusuriya Devaraju , Jens Klump , Henry W. Loescher , Stephan Kindermann , Siddeswara Guru , Mark Grant , Beryl Morris , Lesley Wyborn , Ben Evans , Doron Goldfarb , Melissa A. Genazzio , Xiaoli Ren , Barbara Magagna , Hannes Thiemann , Markus Stocker

When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines.

This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward.

We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well.

Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.



中文翻译:

在领先的环境研究基础架构中集成数据和分析技术:挑战和方法

当研究人员分析数据时,通常需要花费大量精力进行数据准备才能使数据分析准备就绪。这通常涉及清理,预处理,协调或集成来自一个或多个源的数据,并将它们以适合分析的形式放置到计算环境中。研究基础架构及其数据存储库托管数据并将其提供给研究人员使用,但很少为数据分析提供计算环境。通常会持久地标识已发布的数据,但是此类标识符会解析到登录页面上,而这些登录页面必须(手动)进行导航以标识如何访问数据。对于机器而言,这种导航通常具有挑战性或不可能。

本文对改善环境数据访问的现有方法进行了调查,以促进计算环境中更快速的数据分析,从而有助于更无缝地集成数据和分析。通过分析当前由世界领先的环境研究基础设施实施的最新方法和解决方案,我们重点介绍了将数据存储库与计算环境相连接的现有实践以及未来的挑战。

我们发现,尽管近年来标准化水平有所提高,但是对于机器来说,基于持久性标识符发现和访问数据仍然具有挑战性。就FAIR(可查找,可访问,可互操作和可重用)数据的新兴需求而言,这通常是有问题的,尤其是对于数据和分析的无缝集成而言,这是有问题的。有许多有希望的方法可以改进最新技术。这里介绍的一种关键方法涉及软件库,该软件库可将读取数据和元数据简化为计算环境。我们针对两个研究基础结构详细描述了这种方法。我们认为,针对每个RI的专用库以及在数据分析中使用的多种编程语言的开发和维护不能很好地扩展。

基于此观察,我们提出了一套既定的标准和网络实践,如果通过环境研究基础架构实施,则将能够开发RI和独立于编程语言的软件库,从而大大减少了库实施和维护所需的工作量,对用户的学习要求较低。为了促进这种进步,我们提出了RI之间的技术协调路线图和关键行动点,我们认为这将为有效和有效地整合数据和分析奠定基础。

更新日期:2021-02-15
down
wechat
bug