当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing (Technical Report)
arXiv - CS - Databases Pub Date : 2020-03-27 , DOI: arxiv-2003.12396
Aoqian Zhang, Shaoxu Song, Jianmin Wang, Philip S. Yu

Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.

中文翻译:

时序数据清洗:从异常检测到异常修复(技术报告)

错误在时间序列数据中很普遍,例如 GPS 轨迹或传感器读数。现有方法更侧重于异常检测,而不是修复检测到的异常。通过异常检测简单地过滤掉脏数据,应用程序在不完整的时间序列上仍然不可靠。我们不是简单地丢弃异常,而是建议在时间序列数据中(迭代地)修复它们,通过创造性地将异常检测中时间性质的美感与广泛考虑的数据修复中的最小变化原则结合起来。我们的主要贡献包括:(1)基于时间序列数据的迭代最小值修复(IMR)的新框架,(2)对所提出的迭代最小值修复的收敛性的显式分析,以及(3)每次迭代中参数的有效估计。值得注意的是,通过增量计算,我们将参数估计的复杂度从 O(n) 降低到 O(1)。与最先进的方法相比,真实数据集的实验证明了我们的建议的优越性。特别是,我们表明(建议的)修复确实改进了时间序列分类应用。
更新日期:2020-03-30
down
wechat
bug