当前位置: X-MOL 学术J. Clean. Prod. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A feature reconstruction-based multi-task regression model for cyanobacterial distribution forecasting along the water column
Journal of Cleaner Production ( IF 11.1 ) Pub Date : 2021-01-19 , DOI: 10.1016/j.jclepro.2021.126025
Peng Jiang , Yibin Huang , Xiao Liu , Jingjie Zhang , Karina Yew-Hoong Gin

Cyanobacterial water pollution has been threatening the cleaner ecosystem and urban sustainability due to the harmfulness to aquatic ecosystems and human health, which triggers the development of an effective forecasting tool for cyanobacterial blooms. Along the water column, the variations in cyanobacteria cell densities show various distribution patterns and are influenced by multiple environmental factors. Most data-driven models treat cyanobacteria forecasting at a specific water depth as a single task, which fails to share knowledge amongst water depths, resulting in unfavourable forecasting accuracy. This is why an increasing number of nonlinear black-box models have been built for cyanobacteria forecasting but at the expense of model interpretability. This study aims to investigate whether forecasting accuracy and model interpretability can be enhanced by (i) using easily accessible predictors and (ii) developing a feature reconstruction-based multi-task regression model with knowledge sharing amongst water depths. Real-world data from a tropical lake are used to evaluate the effectiveness of the model. For the studied lake, the highest average cyanobacteria cell density occurs at 1.0 m, after which it decreases by over 30% at 5.5 m. The correlation coefficients of time-serial cyanobacteria cell densities between adjacent water depths are greater than 0.95 (P < 0.001). The forecasting results indicate that, compared to single-task nonlinear models, 20.59%, 16.25%, and 22.70% error reductions, measured by the mean square error, are achieved for one-day-ahead, two-day-ahead, and three-day-ahead cyanobacterial bloom forecasts. The accurate bloom and non-bloom signals under the proposed model are up to 94.81% and 98.28%. Based on the proposed model, the relative importance of predictors, the sparsity of regression coefficients, and the covariance relationship of regression coefficients can interpret the model adequately and elucidate the mechanism of knowledge sharing and forecasting accuracy improvement.



中文翻译:

基于特征重构的多任务回归模型沿水柱的蓝藻分布预测

由于对水生生态系统和人类健康的危害,蓝藻水质污染一直威胁着更清洁的生态系统和城市可持续发展,这触发了蓝藻水华预报的有效预测工具的发展。沿着水柱,蓝细菌细胞密度的变化显示出各种分布模式,并受到多种环境因素的影响。大多数数据驱动模型将特定水深处的蓝细菌预测作为单个任务处理,这无法在水深之间共享知识,从而导致不利的预测准确性。这就是为什么越来越多的非线性黑匣子模型已建立用于蓝细菌预测,却以模型的可解释性为代价。这项研究的目的是调查是否可以通过(i)使用易于访问的预测变量,以及(ii)开发基于特征重构的多任务回归模型并在水深之间共享知识来增强预测准确性和模型可解释性。来自热带湖泊的真实数据用于评估模型的有效性。对于所研究的湖泊,最高的平均蓝细菌细胞密度出现在1.0 m,然后在5.5 m处下降超过30%。相邻水深之间的时间序列蓝藻细胞密度的相关系数大于0.95(P <0.001)。预测结果表明,与单任务非线性模型相比,提前一天,提前两天,通过均方误差衡量的错误减少率分别为20.59%,16.25%和22.70%,以及提前三天的蓝藻花期预报。在该模型下,准确的水华信号和非水华信号分别高达94.81%和98.28%。基于所提出的模型,预测变量的相对重要性,回归系数的稀疏性以及回归系数的协方差关系可以充分地解释该模型,并阐明知识共享和预测精度提高的机制。

更新日期:2021-01-28
down
wechat
bug