当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Missing data imputation with adversarially-trained graph convolutional networks.
Neural Networks ( IF 6.0 ) Pub Date : 2020-06-13 , DOI: 10.1016/j.neunet.2020.06.005
Indro Spinelli 1 , Simone Scardapane 1 , Aurelio Uncini 1
Affiliation  

Missing data imputation (MDI) is the task of replacing missing values in a dataset with alternative, predicted ones. Because of the widespread presence of missing data, it is a fundamental problem in many scientific disciplines. Popular methods for MDI use global statistics computed from the entire dataset (e.g., the feature-wise medians), or build predictive models operating independently on every instance. In this paper we propose a more general framework for MDI, leveraging recent work in the field of graph neural networks (GNNs). We formulate the MDI task in terms of a graph denoising autoencoder, where each edge of the graph encodes the similarity between two patterns. A GNN encoder learns to build intermediate representations for each example by interleaving classical projection layers and locally combining information between neighbors, while another decoding GNN learns to reconstruct the full imputed dataset from this intermediate embedding. In order to speed-up training and improve the performance, we use a combination of multiple losses, including an adversarial loss implemented with the Wasserstein metric and a gradient penalty. We also explore a few extensions to the basic architecture involving the use of residual connections between layers, and of global statistics computed from the dataset to improve the accuracy. On a large experimental evaluation with varying levels of artificial noise, we show that our method is on par or better than several alternative imputation methods. On three datasets with pre-existing missing values, we show that our method is robust to the choice of a downstream classifier, obtaining similar or slightly higher results compared to other choices.



中文翻译:

对抗训练图卷积网络缺少数据归因。

缺失数据插补(MDI)是用替代的预测值替换数据集中的缺失值的任务。由于丢失数据的普遍存在,这是许多科学学科中的基本问题。MDI的流行方法是使用从整个数据集计算得出的全局统计信息(例如,按特征取值的中位数),或建立在每个实例上独立运行的预测模型。在本文中,我们利用图神经网络(GNN)领域的最新工作,为MDI提出了一个更通用的框架。我们用图去噪自动编码器来表示MDI任务,其中图的每个边缘编码两个模式之间的相似性。GNN编码器通过交错经典投影层并在邻居之间局部组合信息来学习为每个示例构建中间表示,而另一种解码GNN学习从该中间嵌入中重建完整的估算数据集。为了加快训练速度并提高性能,我们使用了多种损失的组合,包括采用Wasserstein度量和梯度罚分的对抗性损失。我们还探索了对基本体系结构的一些扩展,包括使用层之间的剩余连接以及从数据集中计算出的全局统计信息,以提高准确性。在具有不同水平的人工噪声的大型实验评估中,我们表明,我们的方法与几种替代插补方法相当或更好。在具有预先存在的缺失值的三个数据集上,我们证明了我们的方法对于选择下游分类器是鲁棒的,与其他选择相比,可获得相似或稍高的结果。

更新日期:2020-06-13
down
wechat
bug