当前位置: X-MOL 学术ISPRS J. Photogramm. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating environmental variables into a MODIS-based crop yield estimation method for United States corn and soybeans through the use of a random forest regression algorithm
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2019-12-26 , DOI: 10.1016/j.isprsjprs.2019.12.012
Toshihiro Sakamoto

Satellite-based remote sensing is a powerful form of technology that can provide food security policy makers with reliable information. This information allows them to estimate final crop yields on a global scale within reasonable time frames and with higher spatial resolution than with the use of pure statistical data. Satellite-based crop yield estimation methods are commonly based on the high correlation between the crop yield and the vegetation index (VI), taken at a specific phenological stage. Although VI-based crop yield estimation methods that make use of one approximation formula can easily and effectively estimate the spatial distribution of corn and soybean yields in the United States, there are still some associated drawbacks to this approach that result in the underestimation of crop yields, especially in irrigated regions. Furthermore, a fundamental problem with this approach is the difficulty in evaluation of environmental stress-related physiological disorders such as sterility, which cannot be evaluated based on VIs as an alternative value to biomass. This study’s objective was, thus, to overcome the limitations associated with the conventional approach by incorporating additional environmental variables into the proposed method along with the application of a random forest regression algorithm for estimating United States (US) corn and soybean yields with higher accuracy. This study compared three methods: (1) a conventional method based on a linear regression model (LM method) calibrated using limited past data, (2) a method, which was slightly altered from the LM method in terms of the use of a polynomial regression model (PM method), and (3) the newly proposed method, which involved the application of a random forest regression algorithm and the use of irrigated harvested cropland percentage and reanalysis data for temperature, precipitation, shortwave radiation, and soil moisture (RF method). The time-series correlation between the moderate resolution imaging spectroradiometer (MODIS) wide dynamic ranged vegetation index (WDRVI) and corn and soybean yields were analyzed as part of a preliminary investigation to determine the best time for recording the MODIS WDRVI as an explanatory variable in the study area. The results revealed that the MODIS WDRVI demonstrated the highest correlation with county-level statistical yields 13 days before the silking stage for corn and 6 days before the setting pods stage for soybeans. The regression formulas for the LM and the PM method were developed based on assigning the MODIS WDRVI to these phenological stages as explanatory variables. The advantage of the PM method over the LM method was found to be its adaptability to high-yield counties because of the inherent effect of using a polynomial regression equation. The LM method, which made use of a linear regression equation calibrated using limited past data (2009–2010), could not be adapted to increased yields encountered in recent years without recalibration with the latest data. The RF method learning models were individually optimized for each state and crop. This optimization revealed, that our learning model that incorporated every available variable did not always perform best, probably due to overfitting. In the major irrigated states of Kansas, and Nebraska, the spatial data of the percentage of irrigated harvested cropland improve the estimation accuracy of the RF method for both corn and soybean. In the states of Illinois and Iowa, the RF method, which incorporated primarily the weather-related variables of soil moisture, precipitation, temperature, and shortwave radiation, improved the estimation accuracy due to a response of rainfed agriculture to environmental stress. This is especially true for soybean. The validation results indicated that the estimation accuracy of the RF method (root mean square error RMSE: 0.539 t/ha for corn, 0.206 t/ha for soybeans) was higher than that of the PM method (RMSE: 0.897 t/ha for corn, 0.283 t/ha for soybeans) at the state level, particularly due to the effect of bias correction in irrigated regions. Moreover, it was confirmed that the RF method could provide an accurate estimate of the yield reduction in soybeans caused by a drought occurring during late vegetative stages in Illinois in 2003. The PM method could not be used to evaluate this drought-induced yield reduction based on the VIs and key phenological stages. According to visual depictions of corn- and soybean-yield estimation maps, the RF-derived maps corresponded better with the yield maps derived from the National Agricultural Statistics Service (NASS)-statistical data than the PM-derived maps, especially in low-yield and irrigated regions.



中文翻译:

通过使用随机森林回归算法将环境变量纳入基于MODIS的美国玉米和大豆的农作物产量估算方法中

基于卫星的遥感是一种强大的技术形式,可以为粮食安全政策制定者提供可靠的信息。与使用纯统计数据相比,该信息使他们能够在合理的时间范围内以更高的空间分辨率在全球范围内估算最终作物的产量。基于卫星的农作物产量估算方法通常基于在特定物候阶段获取的农作物产量与植被指数(VI)之间的高度相关性。尽管在美国使用基于一个近似公式的基于VI的作物产量估算方法可以轻松有效地估算玉米和大豆产量的空间分布,但是这种方法仍然存在一些相关的缺点,导致对作物产量的低估,尤其是在灌溉地区。此外,这种方法的基本问题是难以评估与环境压力相关的生理失调,例如不育,无法基于VIs作为生物量的替代价值来评估。因此,本研究的目的是通过将额外的环境变量纳入拟议的方法中,并应用随机森林回归算法来更准确地估算美国(美国)玉米和大豆的产量,从而克服与常规方法相关的局限性。这项研究比较了三种方法:(1)基于使用有限的过去数据校准的线性回归模型(LM方法)的常规方法,(2)一种在使用多项式方面与LM方法略有不同的方法回归模型(PM方法),(3)新提出的方法,该方法包括应用随机森林回归算法,使用灌溉的农田收成百分比以及温度,降水,短波辐射和土壤湿度的再分析数据(RF方法)。作为初步研究的一部分,分析了中分辨率成像光谱仪(MODIS)宽动态范围植被指数(WDRVI)与玉米和大豆产量之间的时间序列相关性,以确定将MODIS WDRVI记录为最佳解释时间的最佳时间。学习区。结果表明,MODIS WDRVI与玉米的蚕丝期前13天和大豆的结荚期前6天显示出与县级统计产量的最高相关性。LM和PM方法的回归公式是基于将MODIS WDRVI分配给这些物候阶段作为解释变量而开发的。由于使用多项式回归方程的内在影响,发现PM方法相对于LM方法的优势在于其对高产县的适应性。LM方法使用的线性回归方程式是使用有限的过去数据(2009-2010)进行校准的,因此如果不使用最新数据进行重新校准,就无法适应近年来产量的增加。RF方法学习模型针对每种州和作物进行了单独优化。这种优化表明,我们的学习模型结合了每个可用的变量,并不一定总是表现最佳,这可能是由于过拟合造成的。在堪萨斯州和内布拉斯加州的主要灌溉州,灌溉耕地百分比的空间数据提高了RF方法对玉米和大豆的估算准确性。在伊利诺伊州和爱荷华州,RF方法主要结合了土壤湿度,降水,温度和短波辐射等与天气相关的变量,由于雨养农业对环境压力的响应,RF方法提高了估算的准确性。对于大豆尤其如此。验证结果表明,RF方法(玉米的均方根误差RMSE:0.539 t / ha,大豆为0.206 t / ha)的估计精度高于PM方法(RMSE:0.897 t / ha玉米) ,在州一级为0.283吨/公顷),特别是由于灌溉区域的偏差校正的影响。而且,可以肯定的是,RF方法可以准确估算2003年伊利诺伊州植物生长后期因干旱引起的大豆减产。PM方法不能用于评估基于干旱的大豆导致的减产。视觉和关键物候阶段。根据玉米和大豆单产估计图的直观描述,与源自PM的图相比,RF衍生的图与从国家农业统计局(NASS)统计数据得出的单产图更好地对应,尤其是在低产量时和灌溉区。基于VI和关键物候阶段,PM方法不能用于评估这种干旱导致的减产。根据玉米和大豆单产估计图的直观描述,与源自PM的图相比,RF衍生的图与从国家农业统计局(NASS)统计数据得出的单产图更好地对应,尤其是在低产量时和灌溉区。基于VI和关键物候阶段,PM方法不能用于评估这种干旱导致的减产。根据玉米和大豆单产估计图的直观描述,与源自PM的图相比,RF衍生的图与从国家农业统计局(NASS)统计数据得出的单产图更好地对应,尤其是在低产量时和灌溉区。

更新日期:2019-12-26
down
wechat
bug