1 Introduction

Crop rotation is widely used by farmers all around the globe. Break crops are grown in between main crops to break weed, disease, and pest cycles and provide a yield boost to the subsequent main crop. In the context of our study, cereals are considered the main crop and broadleaf crops or pastures are the break crops. The additional yield benefit to a cereal crop following a break crop is often referred to as the ‘break crop effect’ (Kirkegaard et al. 2008). The biological mechanisms that influence this break crop effect vary from one farming system to the next. For example, in low rainfall North American wheat/legume systems, break crops may increase wheat yields through a reduction in pests (Lenssen et al. 2013) or an increase in the nitrogen supply (Chen et al. 2012). In water-limited cropping regions, break crops may decrease wheat yields if the break crop uses additional soil moisture and reduces the amount of water available for the cereal crop (Chen et al. 2012). Reviews of break crop effects and the impact of crop rotation on cereal production by Kirkegaard et al. (2008) and Angus et al. (2015) reveal that the size of the break crop effect varies considerably and negative impacts can occur. In Australian conditions, break crop benefits to cereals can equate to 0.6 t ha-1 yield benefit for cereal crops in Western Australia (Seymour et al. 2012) and 0.7 t ha-1 elsewhere (Angus et al. 2015). The size of these benefits varies with season, soil type, and the particular break crop grown.

The decision to grow a break crop by the farmer is complex and depends on the size of the break crop effect, the relative yield, and profitability of the break crop (Fletcher 2019). Information extracted from agronomic experiments may not reflect farmer management practices and may not provide farmers with the necessary insights that apply to their farm (Lacoste et al. 2022). Researchers may not know if their trials and findings are representative of farmer practice, while differences between survey findings about the benefits of break crops and experiments conducted by researchers have been noted in Western Australia (Harries et al. 2015, 2022). Therefore, insights about the agronomic benefit of a break crop to the following cereal crop can vary with the season, soil type, water availability, extent of underlying biotic stress, and soil nutrient status. It can be difficult for an individual farmer to determine what impact a break crop might have on the yield of a subsequent cereal crop. Previous surveys of farm practice about break crops found that farmers grew considerably fewer break crops than was economically optimal (Robertson et al. 2010). One possible reason for this difference is because farmers may not achieve the yield gains commonly reported in the agronomic literature about the benefits of break crops on cereal production. However, recent developments in crop monitoring and crop modeling may provide farmers with the tools needed to quantify these benefits. Crop monitoring and crop modeling have evolved to a point where it is possible to conduct landscape scale evaluations of crop rotations.

Crop species, or Crop ID technology, derived from satellite classifications of crops, have been developed to track the global food supply and monitor annual plantings across most agricultural areas. Typically, people use Sentinel-1 and Sentinel-2 satellites from the European Space Agency or Landsat 8 and Landsat 9 satellites from NASA to monitor the area sown to the significant crops (Adjemian 2012; Fritz et al. 2019; Velde et al. 2019). Local organizations finesse these technologies to cope with cloud (Shendryk et al. 2019; Magno et al. 2021), unbalanced training data (Waldner et al. 2019), small fields and irregular shapes (Burke and Lobell 2017; Lebourgeois et al. 2017), or double cropping systems (Paludo et al. 2020).

Like Crop ID, crop yield can now be monitored at scale. Methods, driven by Gross Primary Productivity (GPP) and pioneered by Reeves et al. (2005), are now available globally (Jaafar and Mourad 2021) through Google Earth Engine. Adaptations of these approaches to estimate crop yield with GPP have been developed for Australia (Chen et al. 2020; Donohue et al. 2018), Africa (Wellington et al. 2022), and China (Yan et al. 2022). Each method to estimate crop yield via satellite incorporates the local effects of climate and terrain to varying degrees. Local models may be calibrated with experimental data, data from farmers’ fields or from government surveys, and typically predict crop yields with a root square mean error (RMSE) in the order of 0.6 t ha-1 for cereal crops.

Finally, crop modeling platforms, like The Agricultural Production Systems sIMulator (APSIM), have been used to create continental scale simulations and draw on local climate grids and soil type grids to generate estimates of crop yield potential (Hochman et al. 2016). Often researchers estimate the management to drive continental scale simulations. Local studies that collect farm data have been able to simulate farmer yields with some precision, but the data collection process is often time-consuming and is rarely repeated (Lawes et al. 2021; Lobell et al. 2005; Mourtzinis et al. 2018). Traditionally, process-driven crop models would be used to evaluate rotational components of the cropping system, following extensive, site-specific parameterization. For example, Araya et al. (2017) used the Decision Support System for Agrotechnology Transfer (DSSAT) to evaluate the benefit of rotation on long-term sorghum production in Kansas. Similarly, APSIM was used to model corn yield responses to crop rotation in Iowa (Puntel et al. 2016). However, crop rotations are not usually modeled at scale, in part because the permutations become intractable, and detailed site-specific data are not available.

Outputs from three modeling paradigms, satellite crop yield models, process crop models, and land use mapping with remote sensing may be combined to provide insight into the rotational benefits on cereal crops, at scale, and across seasons. The objective is to create crop intelligence and agronomic insight without exhaustive and expensive field surveys, the need to collect data directly from farmers, or conduct local and expensive small plot experiments. We describe how we combine multiple data products, created with earth observation technologies, earth observation crop models, process-based crop models, and big data analytics to evaluate the importance of crop rotations across 20 million hectares in Western Australia. We combine these approaches to test the hypothesis that farmers do achieve break crop benefits that are comparable to those achieved in research trials and research surveys. Earth observation is firstly used to predict the outcome of different management decision and then secondly to evaluate the value of crop rotations for every field at a regional scale. We argue that outputs from this suite of information products and big data analytics could provide the precursor to inputs for agricultural decision support systems or agricultural digital twins.

2 Methods

2.1 Study area

The study area compromises the wheatbelt region of Western Australia (WA), which occupies 20 million hectares in the Southwest of WA (Fig. 1). Mixed crop and livestock systems, as well as continuous cropping systems, are widely implemented across this region (Harries et al. 2022). Soils are old, weathered, and of light texture, while the climate is typically Mediterranean. That is, summers are typically hot and dry, and winters are cold and wet. Crops and annual pastures are sown in the Autumn (April/May) and harvested in late Spring or Summer (November/December). Soils are typically the lighter, sandy Tenosols, or the sandy loam Kandosols. Clays may also be classified as Vertosols or Dermosols, although these soil types are less common (Isbell 2016). The plant available water holding capacity across Western Australia ranges from 30 mm to 150 mm (Oliver and Robertson 2009). Across the Western Australian Wheatbelt, winter dominant rainfall drives grain production (Ludwig et al. 2009; Turner and Asseng 2005). Growing season rainfall generally defines an upper limit to cereal grain production (French and Schultz 1984) and averaged 260 mm from 1900 to 2016 across the entire region (Fletcher et al. 2020). The spatial distribution of growing season rainfall and thermal time is presented in Fig. 2.

Fig. 1
figure 1

a A typical Western Australian cropping landscape, with wheat in the foreground, and canola planted behind the wheat crop. b The location of the Western Australian Wheatbelt.

Fig. 2
figure 2

The spatial distribution of growing season rainfall and cumulative thermal time across the wheatbelt of Western Australia.

2.2 Data description: field boundaries, crop identification, and crop rotation

Field polygons, to create field boundaries, were defined for the Western Australian Wheatbelt using a process of semantic segmentation and image processing of Sentinel-2 imagery. Specific details of this process were defined by Waldner and Diakogiannis (2020). Field polygons for the entire wheatbelt, known as ePaddocks, are available at the following website (https://agdatashop.csiro.au/epaddock-australian-paddock-boundaries). In total, 301,209 fields were identified, equating to an arable farmed area of 13,980,729 ha. The total area of 20 million hectares relates to the entire land mass and includes all land uses.

The land use relating to the crop and pasture identification, for each field, was defined annually from 2017 to 2020. Each season, training data were collected across the Western Australian Wheatbelt, and Landsat 8 and Sentinel-1 imagery were used to create a crop identification (Crop ID) for every field in the Western Australian Wheatbelt (Table 1). Details about the acquisition of training data are provided in Lawes et al. (2022). A Random Forest Classifier (Breiman, 2001) was used to create the Crop ID classes that included the crop types of wheat, barley, oats, legume crops, or pasture. Classifications were generated on a per pixel basis of 25 m2. Each field was classified as a particular crop class, based on the most numerous pixel present within the field boundary. Classification accuracies for each year ranged from 74% to 77% across all land use types. Further details about the crop classification processes are described by Fowler et al. (2020). From the CROP ID classifications, both the current crop and the previous crop could be defined, and a two-year crop rotation deduced.

Table 1 Landsat 8 paths and rows and Sentinel-1 orbits for the Western Australian Wheatbelt used for crop classification.

2.3 Aggregating the classification and visualizing classification output

To assist with the visualization of the geographic spread of various crop choices, crop classifications were aggregated onto a 20 km grid to illustrate how species choice varied across the Western Australian Wheatbelt. For each crop, the percentage of crop area occupied for that grid square was calculated. This value was calculated in each year and then averaged across the four seasons. The Simpsons diversity index was applied to these data to illustrate where crop diversity was greatest and where it was least, across the Western Australian Wheatbelt (Eq. 1). The Simpsons diversity index operates between 0 and 1, and grid squares approaching 1 have a more diverse suite of crops than those where the diversity index approaches zero. A zero index is a monoculture, and diversity indices have been used to evaluate crop diversity in the USA (Larsen and Noack 2017). These outputs on the 20 km grid were used to highlight regions of the wheatbelt with high and low diversity that may relate to spatial variation in break crop effects. These aggregated data were not part of the machine learning process. The machine learning process was conducted at the field level.

$$D=1-\left(\frac{\sum n(n-1)}{N(N+1)}\right)$$
(1)

D is the Simpsons Diversity Index, which ranges between 0 and 1 for a 20 km grid square. n is the total area, in hectares of a particular crop or pasture enterprise for a 20 km grid square. N is the total area, in hectares, of all enterprises for a 20 km grid square.

2.4 Crop yield estimation with the C-Crop model

Crop yield for each wheat field was estimated with the C-Crop model from 2017 to 2020 (Donohue et al. 2018). This satellite-driven crop model, which uses GPP to estimate carbon and then wheat yield, has been developed for Australian conditions. It estimates wheat yield at the field scale with an r2 of 0.72 (Donohue et al. 2018) and provides a means of generating vast quantities of yield data, suitable for machine learning applications. Output from the C-Crop model is used as the dependent variable to explore the effect of crop rotation on wheat yield. Actual yields from farmers’ fields were not available, and this remotely sensed estimate of crop yield was the only variable that could be accessed across 20 million hectares easily to provide an estimate of the yield for a particular field. However, the method employed is repeatable and provides consistent insight across years and locations.

2.5 Crop simulation modeling with APSIM, to define the agro-environment

The APSIM crop model was used to define the agro-environment and wheat yield potential for each field. The agro-environment relates to the plant available water holding capacity of the soil (PAWC), plant available water at sowing, plant available water at harvest, and the hot and cold extremes in temperature. Variables relating to climate, such as growing season rainfall (April to October), growing degree days (season length), soil water status, and the yield potential and total above ground biomass, were output from APSIM. Here, yield potential is defined as the water limited yield potential of the crop. The nitrogen supply is unlimited and is comparable to other definitions of water limited yield potential as used by Hochman et al. (2016) and Lawes et al. (2021) for the Australian continent. To create these variables with APSIM, the centroid of each field (from ePaddocks) was used to locate the soil type from the national ASRIS soil grid and the nearest climate file from the Australian Bureau of Meteorology available from the SILO gridded dataset on a 0.05-degree grid. The ASRIS grid estimates soil type for the Australian continent at a 90 m × 90 m resolution (Grundy et al. 2015). An APSIM wheat simulation was generated for every field for every season, following the methods of Hochman et al. (2016).

Output from the APSIM model, crop rotation, and climate information are defined as the independent variables to predict wheat crop yield, derived from the C-Crop model. The details of each attribute are described in Table 2.

Table 2 Description of variables derived from Crop ID and the APSIM model, captured from Agri-Yieldz output.

A summary of these variables and the analytical workflow used to determine the importance of crop rotation in the Western Australian Wheatbelt is provided in Fig. 3.

Fig. 3
figure 3

A graphical summary of the analytical approach to predict the effect of crop rotation on wheat cereal yields in every field.

Wheat crop yield, from C-Crop, was then predicted from the variables in Table 2 using a Random Forest. A recursive feature elimination (RFE) was then used to optimize feature selection and identify the most important predictors (Kuhn 2008). The RFE is a backward selection method where the predictors of any given model are ranked. The least important predictors are sequentially eliminated as part of the RFE process that is applied to the base Random Forest algorithm. The Random Forest builds many decision trees via bagging, in which each tree samples the dataset randomly with replacement. The algorithm’s final prediction is produced by averaging the estimates of each decision tree.

Two prediction models were created. The first is a simple analysis with only four variables: year, growing season rainfall (GSR), crop rotation, and thermal time expressed as growing degree days (GDD), that predict C-Crop yield. The objective was to discover what predictive capacity a limited data set could provide the industry about crop rotation and discover how important crop rotation was to wheat productivity across the broader Western Australian Wheatbelt.

The second analysis was again designed to assess the importance of crop rotation, where additional variables provide local and seasonal context to the prediction. The objective of this analysis was to improve the prediction of crop yield at the field level. Here, the RFE and Random Forest were used to build the model of crop yield, where additional variables that are output from the APSIM crop yield model were used in the prediction. In all, 11 variables were used and included PAW at sowing, PAW at harvest, long-term average growing season rainfall, heat count, soil type, and long-term mean yield from APSIM. These data are available from the Agri-Yieldz website https://agdatashop.csiro.au/agriyieldz-the-wheat-crop-production-estimator. Variables provide an insight into the agronomic state of the field, as derived from the APSIM crop model.

Both analyses used a 10-fold cross validation scheme. For each fold, two splits were used, where 90% of the dataset (164,175 observations) were used in training and 10% (18,242 observations) for testing (i.e., accuracy assessment). A graphical summary of the data, analytical workflow, and creation of output about crop rotations to underpin an agricultural digital twin are presented in Fig. 3.

3 Results

The most dominant crop rotations were wheat following a cereal (38%), wheat following an annual pasture (30%), wheat following canola (15%), wheat following a legume crop (9%), and wheat following a long fallow (6%). The area available for analysis, over 4 years, ranged from 6.4 million ha for wheat following a cereal to 1 million ha for wheat following fallow (Table 3). Crop rotation and the crop diversity varied across the Western Australian Wheatbelt, where crop diversity was greater in the southern regions and higher rainfall zones (Figs. 4 and 5).

Table 3 Percentage of land area occupied by each land use in the Western Australian Wheatbelt, averaged over 4 years of the study.
Fig. 4
figure 4

Variation in crop rotations across the Western Australian Wheatbelt for a cereal, b oilseed, c pasture, d legume crops, and e fallow, where the land use refers to the management carried out prior to growing a wheat crop.

Fig. 5
figure 5

Geographic variation on crop diversity, as measured with a Simpsons Diversity Index across the Western Australian Wheatbelt.

Crop yield, as estimated by the C-Crop model also, varied across the Western Australian Wheatbelt with the season. C-Crop predicted wheat yields averaged 1.84 t ha-1, with a standard deviation of 0.8 t ha-1. Mean wheat yield also varied with season and ranged from an average of 1.4 t ha-1 in 2019 to an average of 2.4 t ha-1 in 2018. Estimates from the Australian Bureau of Statistics were available for 2017, 2018, and 2019 production seasons, and trends detected by C-Crop for WA wheat production are in line with government figures (Table 4).

Table 4 Mean C-Crop estimate of wheat yield and Australian Bureau of Statistics (ABS) wheat yield.

The first, simple Random Forest, defined as model 1, with just 4 variables, was able to predict C-Crop wheat yield from the test data with an r2 of 0.72 (std dev 0.004) and an RMSE of 0.56 t ha-1 (std dev 0.004 t ha-1). Year was the most important variable, followed by growing season rainfall, crop rotation, and thermal time. Crop rotation was important in this analysis, but the effects of crop rotation with model 1 were not conclusive (Fig. 6), and output from this model was not interrogated further.

Fig. 6
figure 6

Variable importance (A) and normalized root mean square error (B) of the model 1, from the 10-fold cross-validation recursive feature elimination with random forest.

The second, more complex model, defined as model 2, with numerous variables that explain the complex interaction between crop rotation and the agro-environment was able to predict C-Crop wheat yield from the test data with an r2 of 0.84 (std dev 0.003) and an RMSE of 0.40 t ha-1 (std dev 0.003 t ha-1). Variable importance scores changed markedly from model 1 to model 2. The most important variable was crop rotation, followed by thermal time and then year. Soil water status and temperature stress events were also important predictors (Fig. 7).

Fig. 7
figure 7

Variable importance (A) and normalized root mean square error (B) of the model 2, from the 10-fold cross-validation recursive feature elimination with random forest.

Output from model 2 was interrogated to understand why crop rotation became the most important variable to predict wheat yield, as defined by the C-Crop model. From this suite of crop rotations in Table 1, partial deviance output from model 2 revealed a 87 kg ha-1 yield advantage, on average, for wheat on legumes compared to wheat on cereal. This advantage increased to 97 kg ha-1 for wheat on canola vs wheat on cereal. In contrast, wheat following pasture and wheat following fallow had small yield disadvantages compared to wheat on wheat. These yield differences were −47 kg ha-1 and −60 kg ha-1, respectively. These average differences are smaller than the overall model RMSE. This is to be expected with a Random Forest, where crop rotation accounts for part of the variation explained by the model. Whilst useful, these partial differences do not completely explain the value of crop rotation in this farming system. For wheat grown after canola and wheat grown after grain legumes the median, positive skew of the distribution demonstrate that crop rotation can occasionally deliver large yield benefits (Table 5, Fig. 8). The consequence of the skewed distribution meant that there was a wheat yield increase of 200 kg ha-1 or more for 27% of wheat crops grown after canola and 26% for crops grown after a legume relative to wheat grown after wheat. For pasture and fallow, the number of fields with a wheat yield increase of 200 kg ha-1 or more declined to just 8% and 7% of fields, respectively.

Table 5 Overall differences from model 2 outputs comparing the effects of each crop rotation choice on wheat yields.
Fig 8
figure 8

Histograms illustrating the number of fields that would have experienced either a yield benefit, or loss, if an alternative rotation choice compared to wheat on wheat was implemented by the farmer, for each year of the investigation. A red dashed line is a negative mean yield effect. A blue dashed line is a positive mean yield effect.

Model 2 accounts for the complex agro-climatic interactions with crop rotation. For the individual field, the influence of the other variables related to climate and soil type can either increase or decrease the impact of a crop rotation choice for that particular field. Simple “rules of thumb” could not be extracted by directly comparing fields with positive break crop effects and those with negative break crop effects. The benefit of the rotation effect on wheat yield varied (Fig. 8).

The complexities of individual seasons, soil types, starting soil moisture, and crop stress do complicate where and when particular rotation choices benefit cereal yields. In Fig. 9, the effect of canola on wheat yields is depicted spatially. In 2018 and in 2020, yield benefits to cereals are most evident in the north-eastern section of the Western Australian Wheatbelt. The benefits to cereals from canola were noticeably lower in 2019 and 2017. The north-eastern section of the Western Australian Wheatbelt is characterized as drier, with a shorter growing season than the southern regions and central regions (Figure 2).

Fig. 9
figure 9

Geographic spread of the positive and negative benefit of canola on wheat yield across the Western Australian Wheatbelt from 2017 to 2020, if a wheat-canola crop rotation was implemented instead of the wheat-cereal rotation.

4 Discussion

Crop rotation is an important component of most cropping systems, as crop rotation can increase the yields of the dominant crop by reducing the extent of biotic threats and, in the case of legume breaks, increase the nutrient supply. However, the value, in terms of the actual yield benefit, that a break crop provides to a subsequent crop can vary. This variability may complicate the decision to grow a break crop. Therefore, in this research, we utilized remote sensing information about crop identification and crop yield, with process-based crop modeling and machine learning to quantify the yield benefit of crop rotation to the yield of the subsequent cereal crop. We performed this assessment, at scale, for every field in the entire 20 million hectares of Western Australia over four growing seasons. While our study used the Western Australian Wheatbelt as a test case, the approach could be applied to any agricultural region in the world provided that appropriate methodologies to estimate crop type and yield were available.

The specific questions tested in this research were 1. What are the rotations in WA and 2 What is the break crop effect on wheat production in WA? The study demonstrated that the most common rotation in WA was wheat following a cereal (Table 3 and Fig. 4). This was followed by wheat grown after a pasture and then wheat after canola. Broadly, the findings in this study relating to the relative popularity of particular farming practices and crop rotations agree with insights from conventional ground surveys (Harries et al. 2015, 2022). This was to be expected, given that Australian crop classification approaches have been developed with extensive training data and predict with accuracies in the order of 80% (Lawes et al. 2022).

The agronomic impact of crop choice and rotation on the subsequent yield of the wheat crop did vary from insights derived from farmer surveys and from individual agronomic trials, conducted in Western Australia. Here, we identified moderate benefits and predict that in at least 70% of wheat fields, the break crop benefit is less than 200 kg ha-1. This contrasts with field surveys, where wheat yields following canola crops generated yield gains of 220 kg ha-1 in a 200 mm rainfall environment and 330 kg ha-1 in a 300 mm rainfall environment (Harries et al. 2022). Harries et al. (2022) analysis suggest break crop effects increase with increasing rainfall. Furthermore, Harries et al. (2022) also identified that wheat yields after pasture would generate 400 kg ha-1 more yield than wheat after wheat in a 200 mm rainfall environment. Harries et al. (2022) finding contrasts with the earlier meta-analysis of Western Australian field trials by Seymour et al. (2012), who identified that the break crop effect was not dependent on rainfall and averaged 0.6 t ha-1 for wheat following a lupin crop and 0.4 t ha-1 for wheat following a canola crop. Therefore, farm surveys have provided different insights into crop rotation, relative to those derived from a meta-analysis of numerous field trials. These differences complicate creating simplified rules for farmers to follow. The differences add weight to Lacoste et al. (2022) view that on farm experimentation is often required to help farmers evaluate a technology or management concept.

In the present study, pastures generated a net penalty to wheat crops, and break crops such as canola and legume crops offered a moderate yield improvement to wheat crops. Both of these findings are at odds with the agronomic literature and with insights derived from farm surveys. This raises important questions about the value of specific agronomic trials for individual farmers, where the conditions of the trial vary from the farm. For example, break crop experiments may be located on sites that have a biotic stress present. This stress would bias the outcome in favor of cereal crops grown following a break crop that mitigated the biotic stress. Researchers may deliberately choose sites with such a stress to demonstrate the management advantages of this practice to farmers. Such practices are normal, as researchers deliberately design and locate field trials to provide the insight required to address a problem. However, if farms experience lower levels of biotic stress than what is commonly seen in field trials, then the benefits of a break crop may be smaller than anticipated.

Biases, or inconsistencies between trialled data, and farm data may materialize for other reasons. For example, the pastures commonly grown by farmers throughout Western Australia may no longer be legume dominant. This would limit the amount of nitrogen fixed, and grass dominant pastures would not provide a break from pathogens. The benefits from these volunteer, weedy pastures would be considerably lower than the benefits that optimally managed, legume dominant pastures provide to the following cereal crop. A recent study by Loi et al. (2022) found that cereal yields following a well-managed, modern legume pasture cultivar produced similar yields to wheat following a fallow, but required less applied nitrogen to achieve that yield. Again, though direct comparisons between wheat following wheat and wheat following a managed pasture was not conducted.

The variation between individual fields, particularly across the Western Australian Wheatbelt, will be greater than the variation that exists in field trials. Therefore, defining the environment of that field, in the context of starting soil water, soil type, growing season rainfall, and other stresses become important, when those factors interact with a particular management strategy. Here, this study predicted a wheat yield response to crop rotation suggesting that break crop benefits were most likely in the lower rainfall, and shorter growing season regions of the Western Australian Wheatbelt. This finding also conflicts with the findings of the earlier studies. Therefore, the ability to provide management insights, at scale, using local information as a component of a machine learning model is highly novel. We contend that the Random Forest predictions can cope with high-level interactions and cope with factors such as starting soil moisture and end of season temperature stress. The approach adopted here was able to quickly provide a prediction for every field in the Western Australian Wheatbelt, and each prediction could be evaluated on a case-by-case basis by a farmer. The discrepancies between the findings in this study and those of Harries et al. (2022) and Seymour et al. (2012) suggest that field-level predictions will require considerable development and engagement with farmers before these more abstract, machine-learned approaches could be deployed into an on-farm management decision support systems.

The analytical approach combined insights from remote sensing and APSIM using machine learning to estimate the rotational benefits for cereal crops. A necessary evolutionary step may involve updating or reconciling each of these estimates with actual data recorded on farm. Since data will not exist on every farm, analytical methods that allow the seamless integration and updating of model predictions for farms with data will need to be developed. This challenge is not trivial from an analytical perspective. Further model-based refinements may be required if farmers have detailed management information about fertilizer or weed control. Thus, in essence, the framework developed here provides the industry with the ability to benchmark production and explores possible benefits of alternative rotation choices. The approach here does combine multiple information sources, but in the future, the approach may need to combine even more disparate sources of information to create a believable and actionable management recommendation for a farmer. That is, this approach allows more nuanced findings to be provided to an individual field and draws on a vast number of permutations relating to season, soil type, and climate to arrive at those predictions. The approach can provide predictions for an individual field and allow users to evaluate different scenarios with a prediction.

Further developments would need to actively consider who the user is and what the users’ needs are. Outputs from the modeling would be valuable for individual farmers and their consultants and agronomists. Researchers may benefit, particularly, if machine learning approaches such as RFE and modeling based approaches deliver insights that are counter-factual to insights derived from field trials. The analytics and model-based information systems developed in this study that created insights about crop rotation at scale could become the precursor to the next generation of agricultural decision support systems or inform an agricultural digital twin.

5 Conclusion

We used machine learning to combine output from process-based crop models, crop identification systems and satellite driven crop models to obtain insights into the benefits of crop rotation across the vast Western Australian Wheatbelt. The outputs identified different crop yield responses to those commonly reported in the agronomy literature. The approach identified that the yield responses varied spatially across the Western Australian Wheatbelt, and climatic factors did influence the outcome. The machine learning approach was employed to produce scenarios for every field in the 20-million-hectare farming region. Outputs from this exercise could form the basis behind future digital decision support systems or agricultural digital twins that farmers, consultants, and researchers may use.