Abstract
It is of critical importance to understand the relationships between crop yield, soil properties and topographic characteristics for agricultural management. This study’s objective was to compare techniques to quantify the relationship between soil and topographic characteristics for predicting crop yield using high-resolution data and analytical techniques. The study was conducted on a multiple field dataset located in Southwestern Ontario, Canada, where few studies have assessed the impact of applications for precision agriculture and machine learning (ML) to the soil property-yield relationship in this region. The dataset included 145,500 observations of corn and soybean yield, topographic and soil nutrient characteristics. The attributes considered for this study included pH, soil organic matter (OM) content, cation exchange capacity (CEC), soil test phosphorus, zinc (Zn), potassium (K), elevation and topographic wetness index. Multiple linear regression (MLR), artificial neural networks, decision trees and random forests were compared to identify methods able to relate soil properties and crop yields on a subfield scale (2 m). Random forests were the most successful at predicting yield with an R2 value of 0.85 for corn and 0.94 for soybeans. MLR was the least successful with an R2 of 0.40 for corn and 0.45 for soybeans. Cross-validation experiments showed that random forest models in most cases could predict low- and high-yield areas from fields excluded from training datasets, but this was not possible in all cases. Techniques tested the models and identified significant soil and topographic attributes when predicting yield, though the identification was subject to some uncertainty. These results suggest that ML techniques might be used to predict high yield areas of fields without existing yield maps, if those fields have similar relationships of soil properties to yield.
Similar content being viewed by others
Data availability
Not applicable.
Code availability
Not applicable.
References
Adamowski, J., Fung Chan, H., Prasher, S. O., Ozga-Zielinski, B., & Sliusarieva, A. (2012). Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resources Research. https://doi.org/10.1029/2010WR009945
Aghighi, H., Azadbakht, M., Ashourloo, D., Shahrabi, H. S., & Radiom, S. (2018). Machine learning regression techniques for the silage maize yield prediction using time-series images of landsat 8 OLI. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(12), 4563–4577.
Altmann, A., Toloşi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: A corrected feature importance measure. Bioinformatics, 26(10), 1340–1347.
A&L Labs. (2017). Soil Analysis Reference Guide. 6–9. Retrieved May, 2020, from http://www.alcanada.com/index_htm_files/Soil_Analysis_Guide.pdf
Bogunovic, I., Mesic, M., Zgorelec, Z., Jurisic, A., & Bilandzija, D. (2014). Spatial variation of soil nutrients on sandy-loam soil. Soil and Tillage Research, 144, 174–183.
Changere, A., & Lal, R. (1997). Slope position and erosional effects on soil properties and corn production on a Miamian soil in central Ohio. Journal of Sustainable Agriculture, 11(1), 5–21.
Chen, L., Gao, Y., Di Zhu, Y. Y., & Liu, Y. (2019). Quantifying the scale effect in geospatial big data using semi-variograms. PLoS ONE. https://doi.org/10.1371/journal.pone.0225139
Chen, C., Hu, K., Li, H., Yun, A., & Li, B. (2015). Three-dimensional mapping of soil organic carbon by combining Kriging method with profile depth function. PLoS ONE, 10(6), e0129038.
Chilès, J. P., & Delfiner, P. (1999). Geostatistics: Modeling spatial uncertainty. Wiley-Interscience.
Chollet, F. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras
Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., & Böhner, J. (2015). System for automated geoscientific analyses (SAGA) v. 2.1. 4. Geoscientific Model Development, 8(7), 1991–2007.
Corwin, D. L., & Lesch, S. M. (2003). Application of soil electrical conductivity to precision agriculture. Agronomy Journal, 95(3), 455–471.
Dahikar, S. S., & Rode, D. S. V. (2014). Agricultural Crop Yield Prediction Using Artificial Neural Network Approach. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering, 2(1), 683–686.
D’Amario, S. C., Rearick, D. C., Fasching, C., Kembel, S. W., Porter-Goff, E., Spooner, D. E., Williams, C. J., Wilson, H. F., & Xenopoulos, M. A. (2019). The prevalence of nonlinearity and detection of ecological breakpoints across a land use gradient in streams. Scientific Reports, 9(1), 1–11.
Drummond, S. T., Sudduth, K. A., Joshi, A., Birrell, S. J., & Kitchen, N. R. (2003). Statistical and neural methods for site–specific yield prediction. Transactions of the ASAE, 46(1), 5.
Elavarasan, D., Vincent, D. R., Sharma, V., Zomaya, A. Y., & Srinivasan, K. (2018). Forecasting yield by integrating agrarian factors and machine learning models: A survey. Computers and Electronics in Agriculture, 155, 257–282.
Frank, R., & Ripley, B. D. (1977). Land use activities in eleven agricultural watersheds in Southern Ontario, Canada. Ontario Ministry of Agriculture and Food, International Reference Group on Great Lakes Pollution from Land Use Activities, 1975–1976. International Joint Commission (IJC) Digital Archive. https://scholar.uwindsor.ca/ijcarchive/122
Gopal, P. M., & Bhargavi, R. (2019). A novel approach for efficient crop yield prediction. Computers and Electronics in Agriculture, 165, 104968.
Gonzalez-Sanchez, A., Frausto-Solis, J., & Ojeda-Bustamante, W. (2014). Predictive ability of machine learning methods for massive crop yield prediction. Spanish Journal of Agricultural Research, 12(2), 313–328.
Han, J., & Kamber, M. (2001). Data mining concepts and techniques (pp. 335–391). Morgan Kaufmann Publishers.
International Society of Precision Agriculture. (2019). Precision agriculture definition: International Society of Precision Agriculture. Retrieved October 29, 2021, from https://www.ispag.org/about/definition
Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., Timlin, D. J., Shim, K.-M., Gerber, J. S., Reddy, V. R., & Kim, S.-H. (2016). Random forests for global and regional crop yield predictions. PLoS ONE. https://doi.org/10.1371/journal.pone.0156571
Johnston, K., Ver Hoef, J. M., Krivoruchko, K., & Lucas, N. (2001). Using ArcGIS geostatistical analyst. Environmental Systems Research Institute.
Jung, W. K., Kitchen, N. R., Sudduth, K. A., & Anderson, S. H. (2006). Spatial characteristics of claypan soil properties in an agricultural field. Soil Science Society of America Journal, 70(4), 1387–1397.
Kaul, M., Hill, R. L., & Walthall, C. (2005). Artificial neural networks for corn and soybean yield prediction. Agricultural Systems, 85(1), 1–18.
Kern, C., Klausch, T., & Kreuter, F. (2019). Tree-based machine learning methods for survey research. Survey Research Methods, 13(1), 73.
Kerr, J. M., DePinto, J. V., McGrath, D., Sowa, S. P., & Swinton, S. M. (2016). Sustainable management of Great Lakes watersheds dominated by agricultural land use. Journal of Great Lakes Research, 42(6), 1252–1259.
Kerry, R., & Oliver, M. A. (2004). Average variograms to guide soil sampling. International Journal of Applied Earth Observation and Geoinformation, 5(4), 307–325.
Khairunniza-Bejo, S., Mustaffha, S., & Ismail, W. I. W. (2014). Application of artificial neural network in predicting crop yield: A review. Journal of Food Science and Engineering, 4(1), 1.
Khazaei, J., Naghavi, M. R., Jahansouz, M. R., & Salimi-Khorshidi, G. (2008). Yield estimation and clustering of chickpea genotypes using soft computing techniques. Agronomy Journal, 100(4), 1077–1087.
Kravchenko, A. N., & Bullock, D. G. (2000). Correlation of corn and soybean grain yield with topography and soil properties. Agronomy Journal, 92(1), 75–83.
Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674.
Liu, J., Goering, C. E., & Tian, L. (2001). A neural network for setting target corn yields. Transactions of the ASAE, 44(3), 705.
Liu, J., Hu, Y., Yang, J., Abdi, D., & Cade-Menun, B. J. (2015). Investigation of soil legacy phosphorus transformation in long-term agricultural fields using sequential fractionation, P K-edge XANES and solution P NMR spectroscopy. Environmental Science & Technology, 49(1), 168–176.
Longman, R. H. G., Ter Braak, C. J. F., & Van Tongeren, O. F. R. (1995). Data analysis in community and landscape ecology. Cambridge University Press.
Mallarino, A. P., Beegle, D. B., & Joern, B. C. (2006). Soil sampling methods for phosphorus-spatial concerns. Southern Education Research Activities (SERA) 17, United States Department of Agriculture.
McConkey, B. G., Ulrich, D. J., & Dyck, F. B. (1997). Slope position and subsoiling effects on soil water and spring wheat yield. Canadian Journal of Soil Science, 77(1), 83–90.
Meersmans, J., De Ridder, F., Canters, F., De Baets, S., & Van Molle, M. (2008). A multiple regression approach to assess the spatial distribution of Soil Organic Carbon (SOC) at the regional scale (Flanders, Belgium). Geoderma, 143(1–2), 1–13.
Metwally, M. S., Shaddad, S. M., Liu, M., Yao, R. J., Abdo, A. I., Li, P., Jiao, J., & Chen, X. (2019). Soil properties spatial variability and delineation of site-specific management zones based on soil fertility using fuzzy clustering in a hilly field in Jianyang, Sichuan, China. Sustainability (switzerland), 11(24), 7084.
Miao, Y., Mulla, D. J., & Robert, P. C. (2006). Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precision Agriculture, 7(2), 117–135.
Mittal, G. S., & Zhang, J. (2000). Prediction of temperature and moisture content of frankfurters during thermal processing using neural network. Meat Science, 55(1), 13–24.
Mohamed, M. N., Wellen, C., Parsons, C. T., Taylor, W. D., Arhonditsis, G., Chomicki, K. M., Boyd, D., Weidman, P., Mundle, S. O. C., Van Cappellen, P., Sharpley, A. N., & Haffner, D. G. (2019). Understanding and managing the re-eutrophication of Lake Erie: Knowledge gaps and research priorities. Freshwater Science, 38(4), 675–691.
Muukkonen, P., Häkkinen, M., & Mäkipää, R. (2009). Spatial variation in soil carbon in the organic layer of managed boreal forest soil—implications for sampling design. Environmental Monitoring and Assessment, 158(1), 67–76.
Mzuku, M., Khosla, R., Reich, R., Inman, D., Smith, F., & MacDonald, L. (2005). Spatial variability of measured soil properties across site-specific management zones. Soil Science Society of America Journal, 69(5), 1572–1579.
Nelligan, C., Sorichetti, R. J., Yousif, M., Thomas, J. L., Wellen, C. C., Parsons, C. T., & Mohamed, M. N. (2021). Then and now: Revisiting nutrient export in agricultural watersheds within southern Ontario’s lower Great Lakes basin. Journal of Great Lakes Research, 47(6), 1689–1701.
[OMAFRA] Ontario Ministry of Agriculture, Food and Rural Affairs. (2017). Agronomy Guide for Field Crops. Publication 811. Ontario Ministry of Agriculture, Food and Rural Affairs.
Panagopoulos, T., Jesus, J., Antunes, M. D. C., & Beltrao, J. (2006). Analysis of spatial interpolation for optimising management of a salinized field cultivated with lettuce. European Journal of Agronomy, 24(1), 1–10.
Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L., & Mouazen, A. M. (2016). Wheat yield prediction using machine learning and advanced sensing techniques. Computers and Electronics in Agriculture, 121, 57–65.
Patro, S., & Sahu, K. K. (2015). Normalization: A preprocessing stage. Non-peer reviewed preprint at arXiv preprint. arXiv:1503.06462.
Pedhazur, E. J. (1982). Multiple regression in behavioral research: Prediction and explanation. Holt, Rinehart, & Winston.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
Pluer, E. M., Robinson, D. T., Meinen, B. U., & Macrae, M. L. (2020). Pairing soil sampling with very-high resolution UAV imagery: An examination of drivers of soil and nutrient movement and agricultural productivity in southern Ontario. Geoderma, 379, 114630.
Raorane, A. A., & Kulkarni, R. V. (2012). Data Mining: An effective tool for yield estimation in the agricultural sector. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 1(2), 1–4.
Richards, N. R., Caldwell, A. G., & Morwick, F. F. (1949). Soil survey of Essex County (No. 11). Experimental Farms Service, Dominion Department of Agriculture and the Ontario Agricultural College.
Robertson, M., Isbister, B., Maling, I., Oliver, Y., Wong, M., Adams, M., Bowden, B., & Tozer, P. (2007). Opportunities and constraints for managing within-field spatial variability in Western Australian grain production. Field Crops Research, 104(1–3), 60–67.
Robinson, T. P., & Metternicht, G. (2005). Comparing the performance of techniques to improve the quality of yield maps. Agricultural Systems, 85(1), 19–41.
SCOOP. (2013). Land Information Ontario Imagery, South-Central Ontario Orthophotography Product, Ontario Ministry of Natural Resources and Forestry. Retrieved February, 2022, from https://geohub.lio.gov.on.ca/documents/442deaef4b894470a57821a2b48f783e/about
Seyhan, A. T., Tayfur, G., Karakurt, M., & Tanogˇlu, M. (2005). Artificial neural network (ANN) prediction of compressive strength of VARTM processed polymer composites. Computational Materials Science, 34(1), 99–105.
Sharpley, A. N., McDowell, R. W., & Kleinman, P. J. (2001). Phosphorus loss from land to water: Integrating agricultural and environmental management. Plant and Soil, 237(2), 287–307.
Sudduth, K. A., Drummond, S. T., Birrell, S. J., & Kitchen, N. R. (1996). Analysis of spatial factors influencing crop yield. In P. C. Robert, R. H. Rust, & W. E. Larson (Eds.), Proceedings of the third international conference on precision agriculture (pp. 129–139). Crop Science Society of America.
SWOOP. (2015). Ontario Digital Elevation Model. Land Information Ontario, Ontario Ministry of Natural Resources and Forestry. Retrieved October, 2018, from https://geohub.lio.gov.on.ca/maps/mnrf::ontario-digital-elevation-model-imagery-derived/about
Tan, C. S., & Reynolds, W. D. (2003). Impacts of recent climate trends on agriculture in southwestern Ontario. Canadian Water Resources Journal, 28(1), 87–97.
Tantalaki, N., Souravlas, S., & Roumeliotis, M. (2019). Data-driven decision making in precision agriculture: The rise of big data in agricultural systems. Journal of Agricultural & Food Information, 20(4), 344–380.
Tey, Y. S., & Brindal, M. (2012). Factors influencing the adoption of precision agricultural technologies: A review for policy implications. Precision Agriculture, 13(6), 713–730.
Utset, A., Ruiz, M. E., Herrera, J., & de Leon, D. P. (1998). A geostatistical method for soil salinity sample site spacing. Geoderma, 86(1–2), 143–151.
Wang, Y. T., Zhang, T. Q., Hu, Q. C., Tan, C. S., Halloran, I. O., Drury, C. F., Reid, D. K., Ma, B. L., Ball-Coelho, B., Lauzon, J. D., Reynolds, W. D., & Welacky, T. (2010). Estimating dissolved reactive phosphorus concentration in surface runoff water from major Ontario soils. Journal of Environmental Quality, 39(5), 1771–1781.
Wang, Y. T., Zhang, T. Q., O’Halloran, I. P., Hu, Q. C., Tan, C. S., Speranzini, D., Macdonald, I., & Patterson, G. (2015). Agronomic and environmental soil phosphorus tests for predicting potential phosphorus loss from Ontario soils. Geoderma, 241, 51–58.
Veenadhari, S., Misra, B., & Singh, C. D. (2011). Data mining techniques for predicting crop productivity—A review article. International Journal of Computer Science and Technology, 2(1), 98–100.
Vollmer-Sanders, C., Allman, A., Busdeker, D., Moody, L. B., & Stanley, W. G. (2016). Building partnerships to scale up conservation: 4R Nutrient Stewardship Certification Program in the Lake Erie watershed. Journal of Great Lakes Research, 42(6), 1395–1402.
Yi, D., Ahn, J., & Ji, S. (2020). An effective optimization method for machine learning based on ADAM. Applied Sciences, 10(3), 1073.
Zhang, Y., Zhen, Q., Li, P., Cui, Y., Xin, J., Yuan, Y., et al. (2020). Storage of soil organic carbon and its spatial variability in an agro-pastoral ecotone of Northern China. Sustainability, 12(6), 2259.
Funding
NSERC Discovery Grant to CW, Canada-Ontario Agreement Grants COA/GLS 1009C and 1010D-16.
Author information
Authors and Affiliations
Contributions
CW obtained funding for the research; CW and HB designed the research; HB conducted the research, including coding and data management, with input and guidance from Christopher Wellen; and Hannah Burdett wrote the paper, with input and guidance from CW.
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declare(s) that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Burdett, H., Wellen, C. Statistical and machine learning methods for crop yield prediction in the context of precision agriculture. Precision Agric 23, 1553–1574 (2022). https://doi.org/10.1007/s11119-022-09897-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11119-022-09897-0