当前位置: X-MOL 学术Comput. Electron. Agric. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms
Computers and Electronics in Agriculture ( IF 8.3 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.compag.2020.105778
Efrem Yohannes Obsie , Hongchun Qu , Francis Drummond

Abstract The most challenging task in the agricultural sector is to accurately predict crop yield. A typical machine learning algorithm often uses real data to predict crop yield. In this study, we used data generated by the Wild Blueberry Pollination Model, a spatially explicit simulation model validated by field observation and experimental data collected in Maine USA during the last 30 years. The main aim of this study is to evaluate the relative importance of bee species composition and weather factors in regulating wild blueberry agroecosystems. Specifically, we sought to reveal how bee species composition and weather affect yield and to predict optimal bee species composition and weather conditions that achieve the best yield using computer simulation and machine learning algorithms. Multiple linear regression (MLR), boosted decision trees (BDT), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as predictive tools. We also performed a predictor selection before submitting our data to the learning algorithms. In this way, we are able to reduce the dimension of the input without a significant drop in prediction accuracy. As a result, clone size, honeybee, bumblebee, Andrena bee species, Osmia bee species, maximum of upper-temperature ranges, and the number of days with precipitation were chosen as the best predictor variable subset. The results showed that the XGBoost outperformed other algorithms in all measures of model performance for predicting the yield of wild blueberry by achieving a coefficient of determination (R2) of 0.938, root mean square error (RMSE) of 343.026, mean absolute error (MAE) of 206 and relative root mean square error (RRMSE) of 5.444%. The results are consistent with previous work on predicting wild blueberry fruit yield using digital color photography by (Zaman et al., 2008). This study showed that crop yield predictions can be based on computer simulation modeling datasets. Therefore, if a reasonable prediction can be reached, this study should have a significant impact, especially when data collection in the field is challenging.

中文翻译:

使用计算机模拟和机器学习算法相结合的野生蓝莓产量预测

摘要 农业领域最具挑战性的任务是准确预测作物产量。典型的机器学习算法通常使用真实数据来预测作物产量。在这项研究中,我们使用了野生蓝莓授粉模型生成的数据,这是一种空间显式模拟模型,通过过去 30 年在美国缅因州收集的实地观察和实验数据进行验证。本研究的主要目的是评估蜜蜂物种组成和天气因素在调节野生蓝莓农业生态系统中的相对重要性。具体而言,我们试图揭示蜜蜂物种组成和天气如何影响产量,并使用计算机模拟和机器学习算法预测可实现最佳产量的最佳蜜蜂物种组成和天气条件。多元线性回归(MLR),提升决策树 (BDT)、随机森林 (RF) 和极端梯度提升 (XGBoost) 被评估为预测工具。在将数据提交给学习算法之前,我们还执行了预测器选择。通过这种方式,我们能够在不显着降低预测精度的情况下减少输入的维度。因此,克隆大小、蜜蜂、大黄蜂、安德雷纳蜂种类、锇蜂种类、最高温度范围和降水天数被选为最佳预测变量子集。结果表明,XGBoost 在预测野生蓝莓产量的所有模型性能指标上均优于其他算法,其决定系数 (R2) 为 0.938,均方根误差 (RMSE) 为 343.026,平均绝对误差 (MAE) 为 206,相对均方根误差 (RRMSE) 为 5.444%。结果与 (Zaman et al., 2008) 之前使用数字彩色摄影预测野生蓝莓果实产量的工作一致。这项研究表明,作物产量预测可以基于计算机模拟建模数据集。因此,如果能够做出合理的预测,这项研究应该会产生重大影响,尤其是在现场数据收集具有挑战性的情况下。
更新日期:2020-11-01
down
wechat
bug