当前位置: X-MOL 学术Adv. Eng. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series
Advanced Engineering Informatics ( IF 8.8 ) Pub Date : 2020-04-03 , DOI: 10.1016/j.aei.2020.101092
Jun Ma , Jack C.P. Cheng , Yuexiong Ding , Changqing Lin , Feifeng Jiang , Mingzhu Wang , Chong Zhai

Air pollution has become one of the world’s largest health and environmental problems. Studies focusing on air quality prediction, influential factors analysis, and control policy evaluation are increasing. When conducting these studies, valid and high-quality air pollution data are necessarily required to generate reasonable results. Missing data, which is frequently contained in the collected raw data, therefore, has become a significant barrier. Existing methods on missing data either cannot effectively capture the temporal and spatial mechanism of air pollution or focus on sequences with low missing rates and random missing positions. To address this problem, this paper proposes a new imputation methodology, namely transferred long short-term memory-based iterative estimation (TLSTM-IE) to impute consecutive missing values with large missing rates. A case study is conducted in New York City to verify the effectiveness and priority of the proposed methodology. Long-interval consecutive missing PM2.5 concentration data are filled. Experimental results show that the proposed model can effectively learn from long-term dependencies and transfer the learned knowledge. The imputation accuracy of the TLSTM-IE model is 25–50% higher than other commonly seen methods. The novelty of this study lies in two aspects. First is that we target at long-interval consecutive missing data, which has not been addressed before by existing studies in atmospheric research. Second is the novel application of transfer learning on missing values imputation. To our best knowledge, no research on air quality has implemented this technique on this problem before.



中文翻译:

空气污染时间序列中无外部特征的长间隔连续缺失值插补的转移学习

空气污染已成为世界上最大的健康与环境问题之一。围绕空气质量预测,影响因素分析和控制政策评估的研究正在增加。进行这些研究时,必须提供有效且高质量的空气污染数据才能得出合理的结果。因此,丢失的数据(通常包含在收集的原始数据中)已成为一个重要的障碍。现有的数据丢失方法要么不能有效地捕捉空气污染的时空机制,要么专注于丢失率低且丢失位置随机的序列。为了解决这个问题,本文提出了一种新的估算方法,即传输长期的基于内存的短期迭代估计(TLSTM-IE)来估算丢失率较高的连续丢失值。在纽约市进行了案例研究,以验证所提出方法的有效性和优先级。填充了长间隔连续丢失的PM2.5浓度数据。实验结果表明,该模型可以有效地从长期依赖中学习并转移所学知识。TLSTM-IE模型的插补精度比其他常见方法高25–50%。这项研究的新颖性在于两个方面。首先,我们的目标是长时间间隔连续丢失的数据,而大气研究中的现有研究以前并未解决这一问题。其次是转移学习在缺失值归因上的新颖应用。

更新日期:2020-04-03
down
wechat
bug