SiDroForest: a comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labeled trees, synthetically generated tree crowns, and Sentinel-2 labeled image patches,Earth System Science Data

当前位置： X-MOL 学术 › Earth Syst. Sci. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SiDroForest: a comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labeled trees, synthetically generated tree crowns, and Sentinel-2 labeled image patches
Earth System Science Data ( IF 11.4 ) Pub Date : 2022-11-11 , DOI: 10.5194/essd-14-4967-2022
Femke van Geffen , Birgit Heim , Frederic Brieger , Rongwei Geng , Iuliia A. Shevtsova , Luise Schulte , Simone M. Stuenzi , Nadine Bernhardt , Elena I. Troeva , Luidmila A. Pestryakova , Evgenii S. Zakharov , Bringfried Pflug , Ulrike Herzschuh , Stefan Kruse

The SiDroForest (Siberian drone-mapped forest inventory) data collection is an attempt to remedy the scarcity of forest structure data in the circumboreal region by providing adjusted and labeled tree-level and vegetation plot-level data for machine learning and upscaling purposes. We present datasets of vegetation composition and tree and plot level forest structure for two important vegetation transition zones in Siberia, Russia; the summergreen–evergreen transition zone in Central Yakutia and the tundra–taiga transition zone in Chukotka (NE Siberia). The SiDroForest data collection consists of four datasets that contain different complementary data types that together support in-depth analyses from different perspectives of Siberian Forest plot data for multi-purpose applications.

i. Dataset 1 provides unmanned aerial vehicle (UAV)-borne data products covering the vegetation plots surveyed during fieldwork (Kruse et al., 2021, https://doi.org/10.1594/PANGAEA.933263). The dataset includes structure-from-motion (SfM) point clouds and red–green–blue (RGB) and red–green–near-infrared (RGN) orthomosaics. From the orthomosaics, point-cloud products were created such as the digital elevation model (DEM), canopy height model (CHM), digital surface model (DSM) and the digital terrain model (DTM). The point-cloud products provide information on the three-dimensional (3D) structure of the forest at each plot.
ii. Dataset 2 contains spatial data in the form of point and polygon shapefiles of 872 individually labeled trees and shrubs that were recorded during fieldwork at the same vegetation plots (van Geffen et al., 2021c, https://doi.org/10.1594/PANGAEA.932821). The dataset contains information on tree height, crown diameter, and species type. These tree and shrub individually labeled point and polygon shapefiles were generated on top of the RGB UVA orthoimages. The individual tree information collected during the expedition such as tree height, crown diameter, and vitality are provided in table format. This dataset can be used to link individual information on trees to the location of the specific tree in the SfM point clouds, providing for example, opportunity to validate the extracted tree height from the first dataset. The dataset provides unique insights into the current state of individual trees and shrubs and allows for monitoring the effects of climate change on these individuals in the future.
iii. Dataset 3 contains a synthesis of 10 000 generated images and masks that have the tree crowns of two species of larch (Larix gmelinii and Larix cajanderi) automatically extracted from the RGB UAV images in the common objects in context (COCO) format (van Geffen et al., 2021a, https://doi.org/10.1594/PANGAEA.932795). As machine-learning algorithms need a large dataset to train on, the synthetic dataset was specifically created to be used for machine-learning algorithms to detect Siberian larch species.
iv. Dataset 4 contains Sentinel-2 (S-2) Level-2 bottom-of-atmosphere processed labeled image patches with seasonal information and annotated vegetation categories covering the vegetation plots (van Geffen et al., 2021b, https://doi.org/10.1594/PANGAEA.933268). The dataset is created with the aim of providing a small ready-to-use validation and training dataset to be used in various vegetation-related machine-learning tasks. It enhances the data collection as it allows classification of a larger area with the provided vegetation classes.

The SiDroForest data collection serves a variety of user communities. The detailed vegetation cover and structure information in the first two datasets are of use for ecological applications, on one hand for summergreen and evergreen needle-leaf forests and also for tundra–taiga ecotones. Datasets 1 and 2 further support the generation and validation of land cover remote-sensing products in radar and optical remote sensing. In addition to providing information on forest structure and vegetation composition of the vegetation plots, the third and fourth datasets are prepared as training and validation data for machine-learning purposes. For example, the synthetic tree-crown dataset is generated from the raw UAV images and optimized to be used in neural networks. Furthermore, the fourth SiDroForest dataset contains S-2 labeled image patches processed to a high standard that provide training data on vegetation class categories for machine-learning classification with JavaScript Object Notation (JSON) labels provided. The SiDroForest data collection adds unique insights into remote hard-to-reach circumboreal forest regions.

中文翻译：

SiDroForest：西伯利亚北方森林调查的综合森林清单，包括基于无人机的点云、单独标记的树木、合成生成的树冠和 Sentinel-2 标记的图像补丁

SiDroForest（西伯利亚无人机地图森林清单）数据收集旨在通过为机器学习和升级目的提供调整和标记的树级和植被地块级数据来弥补环北极地区森林结构数据的稀缺性。我们提供了俄罗斯西伯利亚两个重要植被过渡区的植被组成以及树木和地块级森林结构的数据集；雅库特中部的夏绿-常绿过渡带和楚科奇（西伯利亚东北部）的苔原-针叶林过渡带。SiDroForest 数据集合由四个数据集组成，这些数据集包含不同的互补数据类型，这些数据类型共同支持从西伯利亚森林图数据的不同角度进行深入分析，以用于多用途应用。

一世。数据集 1 提供了无人机 (UAV) 承载的数据产品，涵盖了实地调查期间调查的植被地块（Kruse 等人，2021，https://doi.org/10.1594/PANGAEA.933263）。该数据集包括运动结构 (SfM) 点云和红-绿-蓝 (RGB) 和红-绿-近红外 (RGN) 正射镶嵌。从正射镶嵌图中，创建了数字高程模型（DEM）、冠层高度模型（CHM）、数字表面模型（DSM）和数字地形模型（DTM）等点云产品。点云产品提供有关每个地块上森林的三维 (3D) 结构的信息。
ii. 数据集 2 包含 872 棵单独标记的树木和灌木的点和多边形 shapefile 形式的空间数据，这些数据是在田野调查期间在同一植被地块上记录的（van Geffen 等人，2021c，https://doi.org/10.1594/PANGAEA .932821）。该数据集包含有关树高、树冠直径和物种类型的信息。这些树木和灌木单独标记的点和多边形 shapefile 是在 RGB UVA 正射影像之上生成的。在考察过程中收集的单棵树信息，如树高、树冠直径和生命力，以表格形式提供。该数据集可用于将树木的个别信息与 SfM 点云中特定树木的位置联系起来，例如，提供验证从第一个数据集中提取的树高的机会。
三。数据集 3 包含 10 000 个生成的图像和蒙版的合成，这些图像和蒙版具有两种落叶松（落叶松 gmelinii和落叶松 cajanderi）的树冠，这些图像自动从 RGB 无人机图像中以上下文 (COCO) 格式提取（van Geffen 等等人，2021a，https://doi.org/10.1594/PANGAEA.932795）。由于机器学习算法需要一个大型数据集进行训练，因此专门创建了合成数据集，用于机器学习算法以检测西伯利亚落叶松物种。
iv. 数据集 4 包含 Sentinel-2 (S-2) Level-2 大气底部处理的标记图像块，带有季节性信息和覆盖植被地块的注释植被类别（van Geffen 等人，2021b，https://doi.org /10.1594/PANGAEA.933268)。创建数据集的目的是提供一个小的即用型验证和训练数据集，用于各种与植被相关的机器学习任务。它增强了数据收集，因为它允许使用提供的植被类别对更大的区域进行分类。

SiDroForest 数据收集服务于各种用户社区。前两个数据集中的详细植被覆盖和结构信息可用于生态应用，一方面适用于夏绿和常绿针叶林，也适用于苔原-针叶林交错带。数据集 1 和 2 进一步支持雷达和光学遥感中土地覆盖遥感产品的生成和验证。除了提供有关植被地块的森林结构和植被组成的信息外，第三和第四个数据集还准备作为机器学习的训练和验证数据。例如，合成树冠数据集是从原始无人机图像生成的，并经过优化以用于神经网络。此外，第四个 SiDroForest 数据集包含经过高标准处理的 S-2 标记图像块，这些图像块提供有关植被类别的训练数据，用于机器学习分类，并提供 JavaScript 对象表示法 (JSON) 标签。SiDroForest 数据收集增加了对偏远难以到达的环北方森林地区的独特见解。

更新日期：2022-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>