Machine learning in crop yield modelling: A powerful tool, but no surrogate for science

https://doi.org/10.1016/j.agrformet.2021.108698Get rights and content
Under a Creative Commons license
open access

Highlights

  • Empirical models explained up to 70% of nation-wide variance crop yield.

  • Progress in management and breeding accounted for up to 34% of 40-years’ variance.

  • Air temperature and precipitation were the prevalent meteorological predictors.

  • Similar model performance was achieved with alternative predictors.

  • Soil moisture proved to be less relevant.

Abstract

Provisioning a sufficient stable source of food requires sound knowledge about current and upcoming threats to agricultural production. To that end machine learning approaches were used to identify the prevailing climatic and soil hydrological drivers of spatial and temporal yield variability of four crops, comprising 40 years yield data each from 351 counties in Germany. Effects of progress in agricultural management and breeding were subtracted from the data prior the machine learning modelling by fitting smooth non-linear trends to the 95th percentiles of observed yield data. An extensive feature selection approach was followed then to identify the most relevant predictors out of a large set of candidate predictors, comprising various soil and meteorological data. Particular emphasis was placed on studying the uniqueness of identified key predictors. Random Forest and Support Vector Machine models yielded similar although not identical results, capturing between 50% and 70% of the spatial and temporal variance of silage maize, winter barley, winter rapeseed and winter wheat yield. Equally good performance could be achieved with different sets of predictors. Thus identification of the most reliable models could not be based on the outcome of the model study only but required expert's judgement. Relationships between drivers and response often exhibited optimum curves, especially for summer air temperature and precipitation. In contrast, soil moisture clearly proved less relevant compared to meteorological drivers. In view of the expected climate change both excess precipitation and the excess heat effect deserve more attention in breeding as well as in crop modelling.

Graphical abstract

Image, graphical abstract
  1. Download : Download high-res image (154KB)
  2. Download : Download full-size image
Performance of Random Forest (light blue) and Support Vector Machine (red) models for different predictor sets, applied to the validation data set.

Keywords

Crop modelling
Machine learning
Random forests
Support vector machine
Feature selection
Equivocality

Cited by (0)