当前位置: X-MOL 学术Front. Plant Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forecasting Corn Yield With Machine Learning Ensembles.
Frontiers in Plant Science ( IF 4.1 ) Pub Date : 2020-07-07 , DOI: 10.3389/fpls.2020.01120
Mohsen Shahhosseini 1 , Guiping Hu 1 , Sotirios V Archontoulis 2
Affiliation  

The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a “committee” of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1st. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1st to June 1st) are the most important input features.



中文翻译:

使用机器学习集成预测玉米产量。

利用高性能计算来合成和分析大数据的新技术的出现,增强了我们更准确地预测农作物产量的能力。最近的研究表明,与模拟作物建模相比,机器学习(ML)可以更快,更灵活地提供合理的预测。但是,单一的机器学习模型可以胜过模型的“委员会”(机器学习集成),可以减少预测偏差和/或方差,并且可以更好地捕获数据的基础分布。然而,关于预测准确性,预测时间和规模,有许多方面需要研究。生长季节的预测越早越好,但是,由于先前的研究考虑了所有可用于预测产量的数据,因此尚未对此进行彻底调查。本文提供了一个基于机器学习的框架,可以在考虑了全部和部分季节天气知识的情况下预测美国三个玉米带州(伊利诺伊州,印第安纳州和爱荷华州)的玉米单产。使用阻塞顺序过程设计了几个集成模型,以生成袋外预测。预测以县级尺度进行,并汇总为农业区和州尺度。结果表明,所提出的优化加权集合和平均集合是RRMSE为9.5%的最精确模型。堆叠式LASSO预测的偏差最小(MBE为53 kg / ha),而其他集成模型的偏差也优于基本学习者。反之,尽管随机k折交叉验证已被阻塞的顺序过程代替,但已显示出,对于时间序列数据集,堆叠集成的效果不如加权集成模型,因为它们要求数据必须是非IID才能良好地执行。将我们提出的模型预测与文献进行比较,可以证明我们提出的集合模型所做的预测具有可接受的性能。拥有部分季节性天气知识的情况下的结果表明,最早可以在6月1日做出RRMSE为9.2%的体面的产量预测 将我们提出的模型预测与文献进行比较,可以证明我们提出的集合模型所做的预测具有可接受的性能。拥有部分季节天气知识的情况下的结果表明,最早可以在6月1日做出RRMSE为9.2%的体面的产量预测 将我们提出的模型预测与文献进行比较,可以证明我们提出的集合模型所做的预测具有可接受的性能。拥有部分季节天气知识的情况下的结果表明,最早可以在6月1日做出RRMSE为9.2%的体面的产量预测ST。此外,结果表明,所提出的模型在农业区和州级以及县级规模上的表现均优于单个模型和基准集成。为了找到每个输入特征对所提出的集成模型所做的预测的边际影响,提出了一种方法,该方法是找到集成模型的特征重要性的基础。研究结果表明,与18-24周(5月1至6月1)的天气相对应的天气特征是最重要的输入特征。

更新日期:2020-07-31
down
wechat
bug