当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Continuous decaying of telco big data with data postdiction
GeoInformatica ( IF 2.2 ) Pub Date : 2019-06-21 , DOI: 10.1007/s10707-019-00364-z
Constantinos Costa , Andreas Konstantinidis , Andreas Charalampous , Demetrios Zeinalipour-Yazti , Mohamed F. Mokbel

In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. Additionally, we provide three decaying focus methods that can be plugged into the operators we propose, namely: (i) FIFO-amnesia, which is based on the time that the tuple was created; (ii) SPATIAL-amnesia, which is based on the cellular tower’s location related with the tuple; and (iii) UNIFORM-amnesia, which picks randomly the tuples to be decayed. Similarly, CTBD-DP enables the decaying of streaming data utilizing the TBD-DP tree to extend and update the stored models. In our experimental setup, we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace. Our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data. Our experiments also show that CTBD-DP improves the accuracy over streaming data.

中文翻译:

电信大数据和数据后继衰减

在本文中,我们介绍了两种基于数据后继概念的新型电信大数据(TBD)衰减运算符:coined TBD-DPCTBD-DP。不同于数据预测,做出的一些元组,我们制定了未来的价值主张,其目的数据后测项,目的是使有关的一些元组的过去的价值,这已经不存在了声明,因为它必须被删除,免费磁盘空间。TBD-DP依靠现有的机器学习(ML)算法将TBD抽象为紧凑的模型,可以在必要时进行存储和查询。我们建议的TBD-DP运营商具有以下两个概念阶段:(i)在离线阶段,它使用基于LSTM的层次ML算法来学习随时间和空间变化的模型树(硬币化的TBD-DP树);(ii)在在线阶段,它使用TBD-DP树来以一定的精度恢复数据。另外,我们提供了三种衰减焦点方法,可以将它们插入我们建议的运算符中,即:(i)FIFO健忘症,它基于创建元组的时间;(ii)空间遗忘症,它基于与元组相关的蜂窝塔的位置;(iii)UNIFORM健忘症,它随机选择要衰减的元组。同样,CTBD-DP可以利用TBD-DP衰减流数据树来扩展和更新存储的模型。在我们的实验设置中,我们使用约10GB的匿名真实电信网络跟踪来测量提议的运营商的效率。我们在HDFS上进行Tensorflow的实验结果令人鼓舞,因为它们表明TBD-DP节省了一个数量级的存储空间,同时又保持了恢复数据的高精度。我们的实验还表明,CTBD-DP可以提高流数据的准确性。
更新日期:2019-06-21
down
wechat
bug