Environmental Modelling & Software ( IF 4.8 ) Pub Date : 2021-03-31 , DOI: 10.1016/j.envsoft.2021.105048 Marius Zumwald , Christoph Baumberger , David N. Bresch , Reto Knutti
Data-driven modelling with machine learning (ML) is already being used for predictions in environmental science. However, it is less clear to what extent data-driven models that successfully predict a phenomenon are representationally accurate and thus increase our understanding of the phenomenon. Besides empirical accuracy, we propose three criteria to indirectly assess the relationships learned by the ML algorithms and how they relate to a phenomenon under investigation: first, consistency of the outcomes with background knowledge; second, the adequacy of the measurements, datasets and methods used to construct a data-driven model; third, the robustness of interpretable machine learning analyses across different ML algorithms. We apply the three criteria with a case study modelling of the effect of different urban green infrastructure types on temperature and show that our approach improves the assessment of representational accuracy and reduces representational uncertainty, which can improve the understanding of modelled phenomena.
中文翻译:
评估数据驱动模型的表示精度:城市绿色基础设施对温度的影响
带有机器学习(ML)的数据驱动的建模已经用于环境科学的预测。但是,尚不清楚在何种程度上可以成功预测现象的数据驱动模型在表示上是准确的,从而可以增强我们对现象的理解。除了经验准确性外,我们提出了三个标准来间接评估ML算法学习的关系以及它们与被调查现象之间的关系:第一,结果与背景知识的一致性;第二,用于构建数据驱动模型的度量,数据集和方法是否足够;第三,不同机器学习算法之间可解释的机器学习分析的鲁棒性。