Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration,Journal of Aerosol Science

当前位置： X-MOL 学术 › J. Aerosol Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration
Journal of Aerosol Science ( IF 3.9 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.jaerosci.2020.105694
Pak L. Fung , Martha A. Zaidan , Hilkka Timonen , Jarkko V. Niemi , Anu Kousa , Joel Kuula , Krista Luoma , Sasu Tarkoma , Tuukka Petäjä , Markku Kulmala , Tareq Hussein

Abstract Air quality prediction with black-box (BB) modelling is gaining widespread interest in research and industry. This type of data-driven models work generally better in terms of accuracy but are limited to capture physical, chemical and meteorological processes and therefore accountability for interpretation. In this paper, we evaluated different white-box (WB) and BB methods that estimate atmospheric black carbon (BC) concentration by a suite of observations from the same measurement site. This study involves data in the period of 1st January 2017–31st December 2018 from two measurement sites, from a street canyon site in Makelankatu and from an urban background site in Kumpula, in Helsinki, Finland. At the street canyon site, WB models performed ( R 2 = 0.81–0.87) in a similar way as the BB models did ( R 2 = 0.86–0.87). The overall performance of the BC concentration estimation methods at the urban background site was much worse probably because of a combination of smaller dynamic variability in the BC values and longer data gaps. However, the difference in WB ( R 2 = 0.44–0.60) and BB models ( R 2 = 0.41–0.64) was not significant. Furthermore, the WB models are closer to physics-based models, and it is easier to spot the relative importance of the predictor variable and determine if the model output makes sense. This feature outweighs slightly higher performance of some individual BB models, and inherently the WB models are a better choice due to their transparency in the model architecture. Among all the WB models, IAP and LASSO are recommended due to its flexibility and its efficiency, respectively. Our findings also ascertain the importance of temporal properties in statistical modelling. In the future, the developed BC estimation model could serve as a virtual sensor and complement the current air quality monitoring. Main findings White-box models are preferred over black-box models in estimating black carbon because they are closer to physics-based models, and it is easier to spot the relative importance of the predictor variable. The black carbon model could serve as a virtual sensor integrating into air quality network in support with real measurements, so as to complement the current air quality index.

中文翻译：

白盒与黑盒机器学习模型在估算环境黑碳浓度方面的评估

摘要使用黑盒 (BB) 建模的空气质量预测在研究和工业中获得了广泛的兴趣。这种类型的数据驱动模型通常在准确性方面工作得更好，但仅限于捕捉物理、化学和气象过程，因此对解释负责。在本文中，我们评估了不同的白盒 (WB) 和 BB 方法，这些方法通过来自同一测量地点的一组观测来估计大气黑碳 (BC) 浓度。本研究涉及 2017 年 1 月 1 日至 2018 年 12 月 31 日期间来自两个测量站点的数据，分别来自马克兰卡图的街道峡谷站点和芬兰赫尔辛基昆普拉的城市背景站点。在街道峡谷站点，WB 模型的表现（R 2 = 0.81-0.87）与 BB 模型的表现（R 2 = 0.86-0.87）类似。BC 浓度估计方法在城市背景站点的整体性能要差得多，这可能是因为 BC 值的动态变化较小和数据间隙较长。然而，WB（R 2 = 0.44-0.60）和BB模型（R 2 = 0.41-0.64）的差异不显着。此外，WB 模型更接近于基于物理的模型，更容易发现预测变量的相对重要性并确定模型输出是否有意义。此功能比某些个别 BB 模型的性能略高，而且 WB 模型由于其在模型架构中的透明度而本质上是更好的选择。在所有 WB 模型中，IAP 和 LASSO 分别因其灵活性和效率而被推荐。我们的发现还确定了时间属性在统计建模中的重要性。未来，开发的 BC 估计模型可以作为虚拟传感器，补充当前的空气质量监测。主要发现在估算黑碳时，白盒模型优于黑盒模型，因为它们更接近基于物理的模型，并且更容易发现预测变量的相对重要性。黑碳模型可以作为虚拟传感器集成到空气质量网络中，支持实际测量，以补充当前的空气质量指数。主要发现在估算黑碳时，白盒模型优于黑盒模型，因为它们更接近基于物理的模型，并且更容易发现预测变量的相对重要性。黑碳模型可以作为虚拟传感器集成到空气质量网络中，支持实际测量，以补充当前的空气质量指数。主要发现在估算黑碳时，白盒模型优于黑盒模型，因为它们更接近基于物理的模型，并且更容易发现预测变量的相对重要性。黑碳模型可以作为虚拟传感器集成到空气质量网络中，支持实际测量，以补充当前的空气质量指数。

更新日期：2021-02-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11