当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
tsrobprep -- an R package for robust preprocessing of time series data
arXiv - CS - Mathematical Software Pub Date : 2021-04-26 , DOI: arxiv-2104.12657
Michał Narajewski, Jens Kley-Holsteg, Florian Ziel

Data cleaning is a crucial part of every data analysis exercise. Yet, the currently available R packages do not provide fast and robust methods for cleaning and preparation of time series data. The open source package tsrobprep introduces efficient methods for handling missing values and outliers using model based approaches. For data imputation a probabilistic replacement model is proposed, which may consist of autoregressive components and external inputs. For outlier detection a clustering algorithm based on finite mixture modelling is introduced, which considers typical time series related properties as features. By assigning to each observation a probability of being an outlying data point, the degree of outlyingness can be determined. The methods work robust and are fully tunable. Moreover, by providing the auto_data_cleaning function the data preprocessing can be carried out in one cast, without manual tuning and providing suitable results. The primary motivation of the package is the preprocessing of energy system data, however, the package is also suited for other moderate and large sized time series data set. We present application for electricity load, wind and solar power data.

中文翻译:

tsrobprep-一个R包,用于对时序数据进行可靠的预处理

数据清理是每个数据分析活动的关键部分。但是,当前可用的R包没有提供用于清除和准备时间序列数据的快速而可靠的方法。开源软件包tsrobprep引入了有效的方法,可使用基于模型的方法来处理缺失值和离群值。对于数据插补,提出了一种概率替换模型,该模型可能包括自回归分量和外部输入。为了进行离群值检测,引入了一种基于有限混合模型的聚类算法,该算法将典型的与时间序列相关的特性视为特征。通过为每个观察分配一个成为外围数据点的概率,可以确定外围程度。该方法工作可靠且完全可调。而且,通过提供auto_data_cleaning函数,可以在一次转换中进行数据预处理,而无需手动调整并提供合适的结果。该程序包的主要动机是对能源系统数据进行预处理,但是,该程序包还适用于其他中等和大型时间序列数据集。我们提出了电力负荷,风能和太阳能数据的应用程序。
更新日期:2021-04-27
down
wechat
bug