当前位置: X-MOL 学术Earth Syst. Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning
Earth System Science Data ( IF 11.2 ) Pub Date : 2023-05-23 , DOI: 10.5194/essd-15-2055-2023
Yufang Zhang , Shunlin Liang , Han Ma , Tao He , Qian Wang , Bing Li , Jianglei Xu , Guodong Zhang , Xiaobang Liu , Changhao Xiong

Abstract. Motivated by the lack of long-term global soil moisture products with both high spatial and temporal resolutions, a global 1 km daily spatiotemporally continuous soil moisture product (GLASS SM) was generated from 2000 to 2020 using an ensemble learning model (eXtreme Gradient Boosting – XGBoost). The model was developed by integrating multiple datasets, including albedo, land surface temperature, and leaf area index products from the Global Land Surface Satellite (GLASS) product suite, as well as the European reanalysis (ERA5-Land) soil moisture product, in situ soil moisture dataset from the International Soil Moisture Network (ISMN), and auxiliary datasets (Multi-Error-Removed Improved-Terrain (MERIT) DEM and Global gridded soil information (SoilGrids)). Given the relatively large-scale differences between point-scale in situ measurements and other datasets, the triple collocation (TC) method was adopted to select the representative soil moisture stations and their measurements for creating the training samples. To fully evaluate the model performance, three validation strategies were explored: random, site independent, and year independent. Results showed that although the XGBoost model achieved the highest accuracy on the random test samples, it was clearly a result of model overfitting. Meanwhile, training the model with representative stations selected by the TC method could considerably improve its performance for site- or year-independent test samples. The overall validation accuracy of the model trained using representative stations on the site-independent test samples, which was least likely to be overfitted, was a correlation coefficient (R) of 0.715 and root mean square error (RMSE) of 0.079 m3 m−3. Moreover, compared to the model developed without station filtering, the validation accuracies of the model trained with representative stations improved significantly for most stations, with the median R and unbiased RMSE (ubRMSE) of the model for each station increasing from 0.64 to 0.74 and decreasing from 0.055 to 0.052 m3 m−3, respectively. Further validation of the GLASS SM product across four independent soil moisture networks revealed its ability to capture the temporal dynamics of measured soil moisture (R=0.69–0.89; ubRMSE = 0.033–0.048 m3 m−3). Lastly, the intercomparison between the GLASS SM product and two global microwave soil moisture datasets – the 1 km Soil Moisture Active Passive/Sentinel-1 L2 Radiometer/Radar soil moisture product and the European Space Agency Climate Change Initiative combined soil moisture product at 0.25∘ – indicated that the derived product maintained a more complete spatial coverage and exhibited high spatiotemporal consistency with those two soil moisture products. The annual average GLASS SM dataset from 2000 to 2020 can be freely downloaded from https://doi.org/10.5281/zenodo.7172664 (Zhang et al., 2022a), and the complete product at daily scale is available at http://glass.umd.edu/soil_moisture/ (last access: 12 May 2023).

中文翻译:

使用集成学习生成 2000 年至 2020 年全球 1 公里每日土壤水分产品

摘要。由于缺乏具有高空间和时间分辨率的长期全球土壤水分产品,2000 年至 2020 年使用集成学习模型(eXtreme Gradient Boosting – XGBoost)。该模型是通过集成多个数据集开发的,包括来自全球陆地表面卫星 (GLASS) 产品套件的反照率、地表温度和叶面积指数产品,以及原位欧洲再分析 (ERA5-Land) 土壤水分产品来自国际土壤水分网络 (ISMN) 的土壤水分数据集和辅助数据集(多误差去除改进地形 (MERIT) DEM 和全球网格化土壤信息 (SoilGrids))。鉴于点尺度原位测量数据与其他数据集之间存在较大尺度差异,采用三重配置(TC)方法选择具有代表性的土壤水分站及其测量数据来创建训练样本。为了全面评估模型性能,探索了三种验证策略:随机、地点独立和年份独立。结果表明,虽然 XGBoost 模型在随机测试样本上取得了最高的准确率,但这显然是模型过度拟合的结果。同时,用 TC 方法选择的代表性站点训练模型可以显着提高其对站点或年份独立测试样本的性能。在独立于站点的测试样本上使用代表性站点训练的模型的整体验证准确性,最不可能过度拟合的是 0.715 的相关系数 (R) 和 0.079 m3 m−3 的均方根误差 (RMSE)。此外,与没有站点过滤开发的模型相比,使用代表性站点训练的模型的验证精度对于大多数站点都有显着提高,每个站点的模型的中值 R 和无偏均方根误差 (ubRMSE) 从 0.64 增加到 0.74,然后下降分别从 0.055 到 0.052 m3 m−3。在四个独立的土壤水分网络中对 GLASS SM 产品的进一步验证揭示了其捕获测量土壤水分的时间动态的能力(R=0.69–0.89;ubRMSE = 0.033–0.048 m3 m−3)。最后,GLASS SM 产品与两个全球微波土壤水分数据集之间的相互比较——1 公里土壤水分主动/被动/Sentinel-1 L2 辐射计/雷达土壤水分产品和欧洲航天局气候变化倡议组合土壤水分产品在 0.25∘——表示派生产品保持了更完整的空间覆盖范围,并表现出与这两种土壤水分产品的高度时空一致性。2000 年至 2020 年的年平均 GLASS SM 数据集可从 https://doi.org/10.5281/zenodo.7172664(Zhang 等人,2022a)免费下载,日尺度的完整产品可在 http:// /glass.umd.edu/soil_moisture/(最后访问时间:2023 年 5 月 12 日)。25∘ – 表明衍生产品保持了更完整的空间覆盖,并与这两种土壤水分产品表现出高度的时空一致性。2000 年至 2020 年的年平均 GLASS SM 数据集可从 https://doi.org/10.5281/zenodo.7172664(Zhang 等人,2022a)免费下载,日尺度的完整产品可在 http:// /glass.umd.edu/soil_moisture/(最后访问时间:2023 年 5 月 12 日)。25∘ – 表明衍生产品保持了更完整的空间覆盖,并与这两种土壤水分产品表现出高度的时空一致性。2000 年至 2020 年的年平均 GLASS SM 数据集可从 https://doi.org/10.5281/zenodo.7172664(Zhang 等人,2022a)免费下载,日尺度的完整产品可在 http:// /glass.umd.edu/soil_moisture/(最后访问时间:2023 年 5 月 12 日)。
更新日期:2023-05-23
down
wechat
bug