当前位置: X-MOL 学术Agric. For. Meteorol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modelling wheat yield with antecedent information, satellite and climate data using machine learning methods in Mexico
Agricultural and Forest Meteorology ( IF 5.6 ) Pub Date : 2021-01-16 , DOI: 10.1016/j.agrformet.2020.108317
Diego Gómez , Pablo Salvador , Julia Sanz , José Luis Casanova

Wheat is one of the most important cereal crops in the world, and its demand is expected to increase about 60% by 2050. Thus, appropriate and reliable yield forecasts are fundamental to ensure price stability and food security around the globe. In this study, we developed a Machine Learning (ML) approach to combine satellite and climate data with antecedent wheat yield information (YieldBaseLine) from 2004 – 2018, at municipal level, in Mexico. We compared the performance of four linear (generalized linear model –glm-, ridge regression –ridge-, lasso, partial least squares –pls-) and four non-linear algorithms (k-nearest neighbours –kknn-, support vector machine radial –svmR-, extreme gradient boosting -xgbTree- and random forest –rf) before harvest time. Additionally, we evaluated their performance using five different feature selection scenarios (No FS, FS = 0.9, FS = 0.75, FS = 0.9 and YieldBaseLine). The models were independently tested using two different approaches: random sampling and selective sampling. In the random sampling, the non-linear models performed generally better under the FS = 0.5 scenario, whereas the non-linear models were less sensitive to feature reduction. The results also evidenced the capacity of the YieldBaseLine predictor, combined with satellite and climate data, to address the inter-annual and spatial variability in the study area. The highest prediction accuracy was obtained by the rf method (No FS) with R2 = 0.84. To further prove the model's operability in a simulated real-case scenario, we held out the last year records (2018) to test the models. The best performing model was again the rf (R2 = 0.81). This study proposes a robust methodology to model crop yield (at large scale) and it may be used with operative purposes. Therefore, it can be of interest to decision and law makers, producers, authorities or the wheat industry. In addition, it can help to establish appropriate food security and trading policies. A similar approach can be applied to other regions or crops.



中文翻译:

在墨西哥使用机器学习方法对小麦产量和先前的信息,卫星和气候数据进行建模

小麦是世界上最重要的谷物作物之一,预计到2050年其需求将增长约60%。因此,适当而可靠的单产预测对于确保全球价格稳定和粮食安全至关重要。在这项研究中,我们开发了一种机器学习(ML)方法,将卫星和气候数据与2004年至2018年墨西哥市级小麦产量信息(YieldBaseLine)相结合。我们比较了四种线性算法(广义线性模型–glm-,岭回归–ridge-,套索,偏最小二乘法–pls-)和四种非线性算法(k最近邻–kknn-,支持向量机径向– svmR-,极端梯度增强-xgbTree-和随机森林–rf)在收获之前。另外,我们使用五种不同的特征选择方案(无FS,FS = 0.9,FS = 0.75,FS = 0.9和YieldBaseLine)评估了它们的性能。使用两种不同的方法对模型进行了独立测试:随机抽样和选择性抽样。在随机抽样中,在FS = 0.5的情况下,非线性模型的性能通常更好,而非线性模型对特征约简的敏感性较低。结果还证明了YieldBaseLine预测器结合卫星和气候数据的能力,可以解决研究区域的年际和空间变化。R的rf方法(No FS)获得了最高的预测精度 在随机抽样中,在FS = 0.5的情况下,非线性模型的性能通常更好,而非线性模型对特征约简的敏感性较低。结果还证明了YieldBaseLine预测器结合卫星和气候数据的能力,可以解决研究区域的年际和空间变化。R的rf方法(No FS)获得了最高的预测精度 在随机抽样中,在FS = 0.5的情况下,非线性模型的性能通常更好,而非线性模型对特征约简的敏感性较低。结果还证明了YieldBaseLine预测器结合卫星和气候数据的能力,可以解决研究区域的年际和空间变化。R的rf方法(No FS)获得了最高的预测精度2  = 0.84。为了进一步证明该模型在模拟实际情况下的可操作性,我们提供了去年的记录(2018)来测试模型。表现最好的模型还是rf(R 2  = 0.81)。这项研究提出了一种健壮的方法来模拟作物产量(大规模),并且可以用于操作目的。因此,决策者,立法者,生产者,当局或小麦行业可能会对它感兴趣。此外,它可以帮助建立适当的粮食安全和贸易政策。类似的方法可以应用于其他地区或农作物。

更新日期:2021-01-18
down
wechat
bug