当前位置: X-MOL 学术Microb. Risk Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing Regression Models with Count Data to Artificial Neural Network and Ensemble Models for Prediction of Generic Escherichia coli Population in Agricultural Ponds Based on Weather Station Measurements
Microbial Risk Analysis ( IF 3.0 ) Pub Date : 2021-04-02 , DOI: 10.1016/j.mran.2021.100171
Gonca Buyrukoğlu , Selim Buyrukoğlu , Zeynal Topalcengiz

Indicator microorganisms are monitored in agricultural waters to foster produce safety. Various prediction models are used to estimate the population of indicator microorganisms and pathogens when no observation is available. The purpose of this study was to compare the performance of regression models with count data (zero-inflated Poisson and hurdle negative binomial) to artificial neural network and ensemble models (random forest and AdaBoost) for the prediction of generic Escherichia coli population in agricultural surface waters in relation with weather station measurements. Two-part count data models were built on E. coli population count frequencies (0, [1,10), [10,100), [100,1000), [1000, 10000), (>=10000)) based on the data structure. The use of artificial neural network, AdaBoost, and random forest were determined based on the mean absolute error (MAE) value over pre-tested six models. The MAE was also used to compare the performance of two-part count data models with artificial neural network and ensemble models. Over-dispersed E. coli population count frequencies was calculated between 2.2 and 52.2% for all ponds. Observed and predicted zero E. coli population counts for all ponds were matched from 82 to 100% for zero-inflated Poisson and 100% for hurdle negative binomial regression models. Overdispersion reduced the performance of tested models. AdaBoost-Twelve Estimators had the best performance with the lowest MAE values for all ponds (from 0.87 to 46.60). The ensemble models used in this study provided more promising performance when compared to tested regression models with count data.



中文翻译:

比较回归模型与计数数据与人工神经网络和集合模型,用于基于气象站测量预测农业池塘中通用大肠杆菌种群

监测农业用水中的指示微生物以促进产品安全。当无法进行观察时,可以使用各种预测模型来估计指示微生物和病原体的数量。本研究的目的是将具有计数数据(零膨胀泊松和障碍负二项式)的回归模型与人工神经网络和集成模型(随机森林和 AdaBoost)的性能进行比较,以预测农业表面的通用大肠杆菌种群与气象站测量有关的水域。两部分计数数据模型建立在大肠杆菌上人口计数频率 (0, [1,10), [10,100), [100,1000), [1000, 10000), (>=10000)] 基于数据结构。人工神经网络、AdaBoost 和随机森林的使用是根据预先测试的六个模型的平均绝对误差 (MAE) 值确定的。MAE 还用于比较两部分计数数据模型与人工神经网络和集成模型的性能。对于所有池塘,过度分散的大肠杆菌种群计数频率计算为 2.2% 至 52.2%。观察和预测的零大肠杆菌对于零膨胀泊松模型,所有池塘的种群数量从 82% 到 100% 匹配,对于障碍负二项式回归模型匹配到 100%。过度分散会降低测试模型的性能。AdaBoost-12 估算器的性能最佳,所有池塘的 MAE 值最低(从 0.87 到 46.60)。与使用计数数据的测试回归模型相比,本研究中使用的集成模型提供了更有希望的性能。

更新日期:2021-04-02
down
wechat
bug