Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the automated learning of air pollution prediction models from data collected by mobile sensor networks
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects ( IF 2.9 ) Pub Date : 2021-08-28 , DOI: 10.1080/15567036.2021.1968076
Pedro Mariano 1, 2, 3 , Susana Marta Almeida 3 , Pedro Santana 1, 2
Affiliation  

ABSTRACT

This paper addresses the problem of automated learning of air pollution predictive models that were trained using information gathered by a set of mobile low-cost sensors. Concretely, fast to compute machine learning methods (Decision Trees and Support Vector Machines) were used to build regression models that predict air pollution levels for a given location. The models were trained using the data collected by the OpenSense project, in particular, number of particulate matter, particle diameter, and lung deposited surface area (LDSA). We examined two different sets of attributes: one based on a geographical description of the location under analysis (e.g. distribution of households and roads), and another based on a time series of past air pollution observations in that location. Overall, we have found out that past measures lead to better pollution predictions. The best R2 score was 0.751 obtained with the model that predicts LDSA and was trained with the data set with time series attributes, while the worst R2 was 0.009 obtained with the geographical data set to predict number of particles. The performance of the best model is on par with similar air pollution systems. Moreover it can be used in a production system that requires frequent updates.



中文翻译:

从移动传感器网络收集的数据中自动学习空气污染预测模型

摘要

本文解决了空气污染预测模型的自动学习问题,这些模型使用一组移动低成本传感器收集的信息进行训练。具体而言,使用快速计算机器学习方法(决策树和支持向量机)来构建回归模型,预测给定位置的空气污染水平。这些模型使用 OpenSense 项目收集的数据进行训练,特别是颗粒物数量、粒径和肺沉积表面积 (LDSA)。我们检查了两组不同的属性:一组基于所分析位置的地理描述(例如家庭和道路的分布),另一组基于该位置过去空气污染观测的时间序列。总体,我们发现,过去的措施可以更好地预测污染。预测LDSA并用时间序列属性的数据集训练的模型获得的最佳R2得分为0.751,而使用地理数据集预测粒子数获得的最差R2得分为0.009。最佳模型的性能与类似的空气污染系统相当。此外,它可以用于需要频繁更新的生产系统。

更新日期:2021-08-29
down
wechat
bug