当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
REMIAN
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-09-29 , DOI: 10.1145/3412364
Qian Ma 1 , Yu Gu 2 , Wang-Chien Lee 3 , Ge Yu 2 , Hongbo Liu 1 , Xindong Wu 4
Affiliation  

Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called Real-time and Error-tolerant Missing vAlue ImputatioN (REMAIN), to impute MVs in poor-quality streaming data. Instead of imputing MVs based on all the observed data, REMAIN first initializes the MV imputation model based on a-RANSAC which is capable of detecting and rejecting anomalies in an efficient manner, and then incrementally updates the model parameters upon the arrival of new data to support real-time MV imputation. As the correlations among attributes of the data may change over time in unforseenable ways, we devise a deterioration detection mechanism to capture the deterioration of the imputation model to further improve the imputation accuracy. Finally, we conduct an extensive evaluation on the proposed algorithms using real-world and synthetic datasets. Experimental results demonstrate that REMAIN achieves significantly higher imputation accuracy over existing solutions. Meanwhile, REMAIN improves up to one order of magnitude in time cost compared with existing approaches.

中文翻译:

人面

缺失值(MV)插补是数据挖掘的关键预处理手段。然而,现有的 MV 插补方法大多是为批处理而设计的,因此不适用于流数据,尤其是那些质量较差的数据。在本文中,我们提出了一个框架,称为实时和容错缺失值插补(REMAIN),将 MV 归入劣质流数据中。而不是基于全部观察到的数据,REMAIN首先初始化MV插补模型a-RANSAC它能够以有效的方式检测和拒绝异常,然后在新数据到达时增量更新模型参数以支持实时 MV 插补。由于数据属性之间的相关性可能会随着时间以不可预见的方式发生变化,我们设计了一个劣化检测机制来捕捉插补模型的恶化,以进一步提高插补精度。最后,我们使用真实世界和合成数据集对所提出的算法进行了广泛的评估。实验结果表明,与现有解决方案相比,REMAIN 实现了显着更高的插补精度。同时,与现有方法相比,REMAIN 在时间成本上提高了一个数量级。
更新日期:2020-09-29
down
wechat
bug