当前位置: X-MOL 学术Environmetrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spatial matrix completion for spatially misaligned and high-dimensional air pollution data
Environmetrics ( IF 1.5 ) Pub Date : 2021-12-13 , DOI: 10.1002/env.2713
Phuong T. Vu 1 , Adam A. Szpiro 1 , Noah Simon 1
Affiliation  

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predictive PCA modifies the traditional algorithm to improve the overall predictive performance by leveraging both LR and spatial structures within the data. However, predictive PCA requires complete data or an initial imputation step. Nonparametric imputation techniques without accounting for spatial information may distort the underlying structure of the data, and thus further reduce the predictive performance. We propose a convex optimization problem inspired by the LR matrix completion framework and develop a proximal algorithm to solve it. Missing data are imputed and handled concurrently within the algorithm, which eliminates the necessity of a separate imputation step. We review the connections among those existing methods developed for spatially misaligned multivariate data, and show that our algorithm has lower computational burden and leads to reliable predictive performance as the severity of missing data increases.

中文翻译:

空间错位和高维空气污染数据的空间矩阵补全

在健康污染队列研究中,需要准确预测新地点的污染物浓度,因为固定监测点和研究参与者的位置通常在空间上错位。对于多污染数据,通常会结合主成分分析(PCA)在空间预测之前获得数据的低秩(LR)结构。最近开发的预测 PCA 修改了传统算法,通过利用数据中的 LR 和空间结构来提高整体预测性能。但是,预测 PCA 需要完整的数据或初始插补步骤。不考虑空间信息的非参数插补技术可能会扭曲数据的底层结构,从而进一步降低预测性能。我们提出了一个受 LR 矩阵完成框架启发的凸优化问题,并开发了一种近似算法来解决它。丢失的数据在算法中同时进行插补和处理,从而消除了单独插​​补步骤的必要性。我们回顾了为空间错位的多元数据开发的现有方法之间的联系,并表明我们的算法具有较低的计算负担,并且随着丢失数据的严重性增加而导致可靠的预测性能。
更新日期:2021-12-13
down
wechat
bug