Skip to main content

Advertisement

Log in

Statistical and machine learning methods for crop yield prediction in the context of precision agriculture

  • Published:
Precision Agriculture Aims and scope Submit manuscript

Abstract

It is of critical importance to understand the relationships between crop yield, soil properties and topographic characteristics for agricultural management. This study’s objective was to compare techniques to quantify the relationship between soil and topographic characteristics for predicting crop yield using high-resolution data and analytical techniques. The study was conducted on a multiple field dataset located in Southwestern Ontario, Canada, where few studies have assessed the impact of applications for precision agriculture and machine learning (ML) to the soil property-yield relationship in this region. The dataset included 145,500 observations of corn and soybean yield, topographic and soil nutrient characteristics. The attributes considered for this study included pH, soil organic matter (OM) content, cation exchange capacity (CEC), soil test phosphorus, zinc (Zn), potassium (K), elevation and topographic wetness index. Multiple linear regression (MLR), artificial neural networks, decision trees and random forests were compared to identify methods able to relate soil properties and crop yields on a subfield scale (2 m). Random forests were the most successful at predicting yield with an R2 value of 0.85 for corn and 0.94 for soybeans. MLR was the least successful with an R2 of 0.40 for corn and 0.45 for soybeans. Cross-validation experiments showed that random forest models in most cases could predict low- and high-yield areas from fields excluded from training datasets, but this was not possible in all cases. Techniques tested the models and identified significant soil and topographic attributes when predicting yield, though the identification was subject to some uncertainty. These results suggest that ML techniques might be used to predict high yield areas of fields without existing yield maps, if those fields have similar relationships of soil properties to yield.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

References

  • Adamowski, J., Fung Chan, H., Prasher, S. O., Ozga-Zielinski, B., & Sliusarieva, A. (2012). Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resources Research. https://doi.org/10.1029/2010WR009945

    Article  Google Scholar 

  • Aghighi, H., Azadbakht, M., Ashourloo, D., Shahrabi, H. S., & Radiom, S. (2018). Machine learning regression techniques for the silage maize yield prediction using time-series images of landsat 8 OLI. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(12), 4563–4577.

    Article  Google Scholar 

  • Altmann, A., Toloşi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: A corrected feature importance measure. Bioinformatics, 26(10), 1340–1347.

    Article  CAS  PubMed  Google Scholar 

  • A&L Labs. (2017). Soil Analysis Reference Guide. 6–9. Retrieved May, 2020, from http://www.alcanada.com/index_htm_files/Soil_Analysis_Guide.pdf

  • Bogunovic, I., Mesic, M., Zgorelec, Z., Jurisic, A., & Bilandzija, D. (2014). Spatial variation of soil nutrients on sandy-loam soil. Soil and Tillage Research, 144, 174–183.

    Article  Google Scholar 

  • Changere, A., & Lal, R. (1997). Slope position and erosional effects on soil properties and corn production on a Miamian soil in central Ohio. Journal of Sustainable Agriculture, 11(1), 5–21.

    Article  Google Scholar 

  • Chen, L., Gao, Y., Di Zhu, Y. Y., & Liu, Y. (2019). Quantifying the scale effect in geospatial big data using semi-variograms. PLoS ONE. https://doi.org/10.1371/journal.pone.0225139

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen, C., Hu, K., Li, H., Yun, A., & Li, B. (2015). Three-dimensional mapping of soil organic carbon by combining Kriging method with profile depth function. PLoS ONE, 10(6), e0129038.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Chilès, J. P., & Delfiner, P. (1999). Geostatistics: Modeling spatial uncertainty. Wiley-Interscience.

    Book  Google Scholar 

  • Chollet, F. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras

  • Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., & Böhner, J. (2015). System for automated geoscientific analyses (SAGA) v. 2.1. 4. Geoscientific Model Development, 8(7), 1991–2007.

  • Corwin, D. L., & Lesch, S. M. (2003). Application of soil electrical conductivity to precision agriculture. Agronomy Journal, 95(3), 455–471.

    Google Scholar 

  • Dahikar, S. S., & Rode, D. S. V. (2014). Agricultural Crop Yield Prediction Using Artificial Neural Network Approach. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering, 2(1), 683–686.

    Google Scholar 

  • D’Amario, S. C., Rearick, D. C., Fasching, C., Kembel, S. W., Porter-Goff, E., Spooner, D. E., Williams, C. J., Wilson, H. F., & Xenopoulos, M. A. (2019). The prevalence of nonlinearity and detection of ecological breakpoints across a land use gradient in streams. Scientific Reports, 9(1), 1–11.

    Article  CAS  Google Scholar 

  • Drummond, S. T., Sudduth, K. A., Joshi, A., Birrell, S. J., & Kitchen, N. R. (2003). Statistical and neural methods for site–specific yield prediction. Transactions of the ASAE, 46(1), 5.

    Article  Google Scholar 

  • Elavarasan, D., Vincent, D. R., Sharma, V., Zomaya, A. Y., & Srinivasan, K. (2018). Forecasting yield by integrating agrarian factors and machine learning models: A survey. Computers and Electronics in Agriculture, 155, 257–282.

    Article  Google Scholar 

  • Frank, R., & Ripley, B. D. (1977). Land use activities in eleven agricultural watersheds in Southern Ontario, Canada. Ontario Ministry of Agriculture and Food, International Reference Group on Great Lakes Pollution from Land Use Activities, 1975–1976. International Joint Commission (IJC) Digital Archive. https://scholar.uwindsor.ca/ijcarchive/122

  • Gopal, P. M., & Bhargavi, R. (2019). A novel approach for efficient crop yield prediction. Computers and Electronics in Agriculture, 165, 104968.

    Article  Google Scholar 

  • Gonzalez-Sanchez, A., Frausto-Solis, J., & Ojeda-Bustamante, W. (2014). Predictive ability of machine learning methods for massive crop yield prediction. Spanish Journal of Agricultural Research, 12(2), 313–328.

    Article  Google Scholar 

  • Han, J., & Kamber, M. (2001). Data mining concepts and techniques (pp. 335–391). Morgan Kaufmann Publishers.

    Google Scholar 

  • International Society of Precision Agriculture. (2019). Precision agriculture definition: International Society of Precision Agriculture. Retrieved October 29, 2021, from https://www.ispag.org/about/definition

  • Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., Timlin, D. J., Shim, K.-M., Gerber, J. S., Reddy, V. R., & Kim, S.-H. (2016). Random forests for global and regional crop yield predictions. PLoS ONE. https://doi.org/10.1371/journal.pone.0156571

    Article  PubMed  PubMed Central  Google Scholar 

  • Johnston, K., Ver Hoef, J. M., Krivoruchko, K., & Lucas, N. (2001). Using ArcGIS geostatistical analyst. Environmental Systems Research Institute.

    Google Scholar 

  • Jung, W. K., Kitchen, N. R., Sudduth, K. A., & Anderson, S. H. (2006). Spatial characteristics of claypan soil properties in an agricultural field. Soil Science Society of America Journal, 70(4), 1387–1397.

    Article  CAS  Google Scholar 

  • Kaul, M., Hill, R. L., & Walthall, C. (2005). Artificial neural networks for corn and soybean yield prediction. Agricultural Systems, 85(1), 1–18.

    Article  Google Scholar 

  • Kern, C., Klausch, T., & Kreuter, F. (2019). Tree-based machine learning methods for survey research. Survey Research Methods, 13(1), 73.

    PubMed  PubMed Central  Google Scholar 

  • Kerr, J. M., DePinto, J. V., McGrath, D., Sowa, S. P., & Swinton, S. M. (2016). Sustainable management of Great Lakes watersheds dominated by agricultural land use. Journal of Great Lakes Research, 42(6), 1252–1259.

    Article  Google Scholar 

  • Kerry, R., & Oliver, M. A. (2004). Average variograms to guide soil sampling. International Journal of Applied Earth Observation and Geoinformation, 5(4), 307–325.

    Article  Google Scholar 

  • Khairunniza-Bejo, S., Mustaffha, S., & Ismail, W. I. W. (2014). Application of artificial neural network in predicting crop yield: A review. Journal of Food Science and Engineering, 4(1), 1.

    Google Scholar 

  • Khazaei, J., Naghavi, M. R., Jahansouz, M. R., & Salimi-Khorshidi, G. (2008). Yield estimation and clustering of chickpea genotypes using soft computing techniques. Agronomy Journal, 100(4), 1077–1087.

    Article  Google Scholar 

  • Kravchenko, A. N., & Bullock, D. G. (2000). Correlation of corn and soybean grain yield with topography and soil properties. Agronomy Journal, 92(1), 75–83.

    Article  Google Scholar 

  • Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674.

    Article  PubMed Central  Google Scholar 

  • Liu, J., Goering, C. E., & Tian, L. (2001). A neural network for setting target corn yields. Transactions of the ASAE, 44(3), 705.

    Google Scholar 

  • Liu, J., Hu, Y., Yang, J., Abdi, D., & Cade-Menun, B. J. (2015). Investigation of soil legacy phosphorus transformation in long-term agricultural fields using sequential fractionation, P K-edge XANES and solution P NMR spectroscopy. Environmental Science & Technology, 49(1), 168–176.

    Article  CAS  Google Scholar 

  • Longman, R. H. G., Ter Braak, C. J. F., & Van Tongeren, O. F. R. (1995). Data analysis in community and landscape ecology. Cambridge University Press.

    Book  Google Scholar 

  • Mallarino, A. P., Beegle, D. B., & Joern, B. C. (2006). Soil sampling methods for phosphorus-spatial concerns. Southern Education Research Activities (SERA) 17, United States Department of Agriculture.

  • McConkey, B. G., Ulrich, D. J., & Dyck, F. B. (1997). Slope position and subsoiling effects on soil water and spring wheat yield. Canadian Journal of Soil Science, 77(1), 83–90.

    Article  Google Scholar 

  • Meersmans, J., De Ridder, F., Canters, F., De Baets, S., & Van Molle, M. (2008). A multiple regression approach to assess the spatial distribution of Soil Organic Carbon (SOC) at the regional scale (Flanders, Belgium). Geoderma, 143(1–2), 1–13.

    Article  CAS  Google Scholar 

  • Metwally, M. S., Shaddad, S. M., Liu, M., Yao, R. J., Abdo, A. I., Li, P., Jiao, J., & Chen, X. (2019). Soil properties spatial variability and delineation of site-specific management zones based on soil fertility using fuzzy clustering in a hilly field in Jianyang, Sichuan, China. Sustainability (switzerland), 11(24), 7084.

    Article  CAS  Google Scholar 

  • Miao, Y., Mulla, D. J., & Robert, P. C. (2006). Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precision Agriculture, 7(2), 117–135.

    Article  Google Scholar 

  • Mittal, G. S., & Zhang, J. (2000). Prediction of temperature and moisture content of frankfurters during thermal processing using neural network. Meat Science, 55(1), 13–24.

    Article  CAS  PubMed  Google Scholar 

  • Mohamed, M. N., Wellen, C., Parsons, C. T., Taylor, W. D., Arhonditsis, G., Chomicki, K. M., Boyd, D., Weidman, P., Mundle, S. O. C., Van Cappellen, P., Sharpley, A. N., & Haffner, D. G. (2019). Understanding and managing the re-eutrophication of Lake Erie: Knowledge gaps and research priorities. Freshwater Science, 38(4), 675–691.

    Article  Google Scholar 

  • Muukkonen, P., Häkkinen, M., & Mäkipää, R. (2009). Spatial variation in soil carbon in the organic layer of managed boreal forest soil—implications for sampling design. Environmental Monitoring and Assessment, 158(1), 67–76.

    Article  CAS  PubMed  Google Scholar 

  • Mzuku, M., Khosla, R., Reich, R., Inman, D., Smith, F., & MacDonald, L. (2005). Spatial variability of measured soil properties across site-specific management zones. Soil Science Society of America Journal, 69(5), 1572–1579.

    Article  CAS  Google Scholar 

  • Nelligan, C., Sorichetti, R. J., Yousif, M., Thomas, J. L., Wellen, C. C., Parsons, C. T., & Mohamed, M. N. (2021). Then and now: Revisiting nutrient export in agricultural watersheds within southern Ontario’s lower Great Lakes basin. Journal of Great Lakes Research, 47(6), 1689–1701.

    Article  CAS  Google Scholar 

  • [OMAFRA] Ontario Ministry of Agriculture, Food and Rural Affairs. (2017). Agronomy Guide for Field Crops. Publication 811. Ontario Ministry of Agriculture, Food and Rural Affairs.

  • Panagopoulos, T., Jesus, J., Antunes, M. D. C., & Beltrao, J. (2006). Analysis of spatial interpolation for optimising management of a salinized field cultivated with lettuce. European Journal of Agronomy, 24(1), 1–10.

    Article  Google Scholar 

  • Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L., & Mouazen, A. M. (2016). Wheat yield prediction using machine learning and advanced sensing techniques. Computers and Electronics in Agriculture, 121, 57–65.

    Article  Google Scholar 

  • Patro, S., & Sahu, K. K. (2015). Normalization: A preprocessing stage. Non-peer reviewed preprint at arXiv preprint. arXiv:1503.06462.

  • Pedhazur, E. J. (1982). Multiple regression in behavioral research: Prediction and explanation. Holt, Rinehart, & Winston.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  • Pluer, E. M., Robinson, D. T., Meinen, B. U., & Macrae, M. L. (2020). Pairing soil sampling with very-high resolution UAV imagery: An examination of drivers of soil and nutrient movement and agricultural productivity in southern Ontario. Geoderma, 379, 114630.

    Article  CAS  Google Scholar 

  • Raorane, A. A., & Kulkarni, R. V. (2012). Data Mining: An effective tool for yield estimation in the agricultural sector. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 1(2), 1–4.

    Google Scholar 

  • Richards, N. R., Caldwell, A. G., & Morwick, F. F. (1949). Soil survey of Essex County (No. 11). Experimental Farms Service, Dominion Department of Agriculture and the Ontario Agricultural College.

  • Robertson, M., Isbister, B., Maling, I., Oliver, Y., Wong, M., Adams, M., Bowden, B., & Tozer, P. (2007). Opportunities and constraints for managing within-field spatial variability in Western Australian grain production. Field Crops Research, 104(1–3), 60–67.

    Article  Google Scholar 

  • Robinson, T. P., & Metternicht, G. (2005). Comparing the performance of techniques to improve the quality of yield maps. Agricultural Systems, 85(1), 19–41.

    Article  Google Scholar 

  • SCOOP. (2013). Land Information Ontario Imagery, South-Central Ontario Orthophotography Product, Ontario Ministry of Natural Resources and Forestry. Retrieved February, 2022, from https://geohub.lio.gov.on.ca/documents/442deaef4b894470a57821a2b48f783e/about

  • Seyhan, A. T., Tayfur, G., Karakurt, M., & Tanogˇlu, M. (2005). Artificial neural network (ANN) prediction of compressive strength of VARTM processed polymer composites. Computational Materials Science, 34(1), 99–105.

    Article  CAS  Google Scholar 

  • Sharpley, A. N., McDowell, R. W., & Kleinman, P. J. (2001). Phosphorus loss from land to water: Integrating agricultural and environmental management. Plant and Soil, 237(2), 287–307.

    Article  CAS  Google Scholar 

  • Sudduth, K. A., Drummond, S. T., Birrell, S. J., & Kitchen, N. R. (1996). Analysis of spatial factors influencing crop yield. In P. C. Robert, R. H. Rust, & W. E. Larson (Eds.), Proceedings of the third international conference on precision agriculture (pp. 129–139). Crop Science Society of America.

  • SWOOP. (2015). Ontario Digital Elevation Model. Land Information Ontario, Ontario Ministry of Natural Resources and Forestry. Retrieved October, 2018, from https://geohub.lio.gov.on.ca/maps/mnrf::ontario-digital-elevation-model-imagery-derived/about

  • Tan, C. S., & Reynolds, W. D. (2003). Impacts of recent climate trends on agriculture in southwestern Ontario. Canadian Water Resources Journal, 28(1), 87–97.

    Article  Google Scholar 

  • Tantalaki, N., Souravlas, S., & Roumeliotis, M. (2019). Data-driven decision making in precision agriculture: The rise of big data in agricultural systems. Journal of Agricultural & Food Information, 20(4), 344–380.

    Article  Google Scholar 

  • Tey, Y. S., & Brindal, M. (2012). Factors influencing the adoption of precision agricultural technologies: A review for policy implications. Precision Agriculture, 13(6), 713–730.

    Article  Google Scholar 

  • Utset, A., Ruiz, M. E., Herrera, J., & de Leon, D. P. (1998). A geostatistical method for soil salinity sample site spacing. Geoderma, 86(1–2), 143–151.

    Article  Google Scholar 

  • Wang, Y. T., Zhang, T. Q., Hu, Q. C., Tan, C. S., Halloran, I. O., Drury, C. F., Reid, D. K., Ma, B. L., Ball-Coelho, B., Lauzon, J. D., Reynolds, W. D., & Welacky, T. (2010). Estimating dissolved reactive phosphorus concentration in surface runoff water from major Ontario soils. Journal of Environmental Quality, 39(5), 1771–1781.

    Article  CAS  PubMed  Google Scholar 

  • Wang, Y. T., Zhang, T. Q., O’Halloran, I. P., Hu, Q. C., Tan, C. S., Speranzini, D., Macdonald, I., & Patterson, G. (2015). Agronomic and environmental soil phosphorus tests for predicting potential phosphorus loss from Ontario soils. Geoderma, 241, 51–58.

    Article  CAS  Google Scholar 

  • Veenadhari, S., Misra, B., & Singh, C. D. (2011). Data mining techniques for predicting crop productivity—A review article. International Journal of Computer Science and Technology, 2(1), 98–100.

    Google Scholar 

  • Vollmer-Sanders, C., Allman, A., Busdeker, D., Moody, L. B., & Stanley, W. G. (2016). Building partnerships to scale up conservation: 4R Nutrient Stewardship Certification Program in the Lake Erie watershed. Journal of Great Lakes Research, 42(6), 1395–1402.

    Article  Google Scholar 

  • Yi, D., Ahn, J., & Ji, S. (2020). An effective optimization method for machine learning based on ADAM. Applied Sciences, 10(3), 1073.

    Article  Google Scholar 

  • Zhang, Y., Zhen, Q., Li, P., Cui, Y., Xin, J., Yuan, Y., et al. (2020). Storage of soil organic carbon and its spatial variability in an agro-pastoral ecotone of Northern China. Sustainability, 12(6), 2259.

    Article  CAS  Google Scholar 

Download references

Funding

NSERC Discovery Grant to CW, Canada-Ontario Agreement Grants COA/GLS 1009C and 1010D-16.

Author information

Authors and Affiliations

Authors

Contributions

CW obtained funding for the research; CW and HB designed the research; HB conducted the research, including coding and data management, with input and guidance from Christopher Wellen; and Hannah Burdett wrote the paper, with input and guidance from CW.

Corresponding author

Correspondence to Hannah Burdett.

Ethics declarations

Conflict of interest

The author(s) declare(s) that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 14 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Burdett, H., Wellen, C. Statistical and machine learning methods for crop yield prediction in the context of precision agriculture. Precision Agric 23, 1553–1574 (2022). https://doi.org/10.1007/s11119-022-09897-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11119-022-09897-0

Keywords

Navigation