当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TurboLift: fast accuracy lifting for historical data recovery
The VLDB Journal ( IF 4.2 ) Pub Date : 2020-03-09 , DOI: 10.1007/s00778-020-00609-6
Fan Yang , Faisal M. Almutairi , Hyun Ah Song , Christos Faloutsos , Nicholas D. Sidiropoulos , Vladimir Zadorozhny

Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift, which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection.

中文翻译:

TurboLift:快速准确地提升历史数据恢复

历史数据经常涉及以下情况:时间序列的可用报告在不同级别(例如,每月麻疹感染人数。在实际的数据库中,不同报告所涵盖的时间段可能有重叠(即,多个报告涵盖的时间点)或间隙(即,任何报告均未涵盖的时间点)。但是,数据分析和机器学习模型要求以更精细的粒度(例如每周患者计数)重建历史事件,以进行详细的分析和预测。因此,数据分解算法在各个领域中变得越来越重要。时间序列分解方法通常利用有关数据的领域知识(例如,平滑度,周期性或稀疏度)来提高重建精度。在本文中,我们提出了一种新颖的方法,称为TurboLift,目的是提高现有分类方法提供的解决方案的质量。从通过特定方法生成的解决方案开始,TurboLift找到了一种新的解决方案,该解决方案可减少分类错误并接近初始解决方案。我们为TurboLift的拟议公式导出了一种封闭形式的解决方案,该解决方案使我们能够解析地获得准确的重构,而无需执行资源和耗时的迭代。来自不同领域的真实数据实验证明了TurboLift在分解错误,异常值和异常检测方面的有效性。
更新日期:2020-03-09
down
wechat
bug