Skip to main content
Log in

Continuous decaying of telco big data with data postdiction

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. Additionally, we provide three decaying focus methods that can be plugged into the operators we propose, namely: (i) FIFO-amnesia, which is based on the time that the tuple was created; (ii) SPATIAL-amnesia, which is based on the cellular tower’s location related with the tuple; and (iii) UNIFORM-amnesia, which picks randomly the tuples to be decayed. Similarly, CTBD-DP enables the decaying of streaming data utilizing the TBD-DP tree to extend and update the stored models. In our experimental setup, we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace. Our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data. Our experiments also show that CTBD-DP improves the accuracy over streaming data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. TBD Awareness, https://tbd.cs.ucy.ac.cy/

References

  1. Abbasoğlu MA, Gedik B, Ferhatosmanoğlu H (2013) Aggregate profile clustering for telco analytics. Proc VLDB Endow 6(12):1234–1237. https://doi.org/10.14778/2536274.2536284

    Article  Google Scholar 

  2. Agarwal PK, Cormode G, Huang Z, Phillips J, Wei Z, Yi K (2012) Mergeable summaries. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS ’12. ACM, New York, pp 23–34. http://doi.acm.org/10.1145/2213556.2213562

  3. Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, EuroSys ’13. ACM, New York, pp 29–42. http://doi.acm.org/10.1145/2465351.2465355

  4. Barbará D, DuMouchel W, Faloutsos C, Haas PJ, Hellerstein JM, Ioannidis YE, Jagadish HV, Johnson T, Ng RT, Poosala V, Ross KA, Sevcik KC (1997) The new jersey data reduction report. IEEE Data Eng Bull 20 (4):3–45. http://sites.computer.org/debull/97DEC-CD.pdf

    Google Scholar 

  5. Bhattacherjee S, Deshpande A, Sussman A (2014) Pstore: an efficient storage framework for managing scientific data. In: Proceedings of the 26th international conference on scientific and statistical database management, SSDBM ’14. ACM, New York, pp 25:1–25:12. http://doi.acm.org/10.1145/2618243.2618268

  6. Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. Proc VLDB Endow 8(12):1346–1357

    Article  Google Scholar 

  7. Bicer T, Yin J, Chiu D, Agrawal G, Schuchardt K (2013) Integrating online compression to accelerate large-scale data analytics applications. In: 2013 IEEE 27th International symposium on parallel & distributed processing (IPDPS). IEEE, pp 1205–1216

  8. Bouillet E, Kothari R, Kumar V, Mignet L, Nathan S, Ranganathan A, Turaga DS, Udrea O, Verscheure O (2012) Processing 6 billion cdrs/day: from research to production (experience report). In: Proceedings of the 6th ACM international conference on distributed event-based systems, DEBS ’12. ACM, New York, pp 264–267, https://doi.org/10.1145/2335484.2335513

  9. Braun L, Etter T, Gasparis G, Kaufmann M, Kossmann D, Widmer D, Avitzur A, Iliopoulos A, Levy E, Liang N (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 251–264, https://doi.org/10.1145/2723372.2742783

  10. Burtscher M, Ratanaworabhan P (2009) Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans Comput 58(1):18–31

    Article  Google Scholar 

  11. Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32:2. http://doi.acm.org/10.1145/1242524.1242526

    Article  Google Scholar 

  12. Cormode G, Garofalakis M, Haas PJ, Jermaine C (2012) Synopses for massive data: samples, histograms, wavelets, sketches. Found Trends Datab 4(1–3):1–294. https://doi.org/10.1561/1900000004

    Article  Google Scholar 

  13. Costa C, Zeinalipour-Yazti D (2018) Telco big data: current state and future directions. In: Proceedings of the 19th IEEE international conference on mobile data management. IEEE Computer Society, ISBN: 978-1-5386-4133-0, June 27, 2018, Aalborg, Denmark, MDM‘18, pp 11–12. https://doi.org/10.1109/MDM.2018.00016

  14. Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Efficient exploration of telco big data with compression and decaying. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 1332–1343. https://doi.org/10.1109/ICDE.2017.175

  15. Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Towards real-time road traffic analytics using telco big data. In: Proceedings of the international workshop on real-time business intelligence and analytics, BIRTE, Munich, Germany, August 28, 2017, pp 5:1–5:5. http://doi.acm.org/10.1145/3129292.3129296

  16. Costa C, Charalampous A, Konstantinidis A, Zeinalipour-Yazti D, Mokbel MF (2018) Decaying telco big data with data postdiction. In: 2018 19th IEEE international conference on mobile data management (MDM), pp 106–115. https://doi.org/10.1109/MDM.2018.00027

  17. Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (gru) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243

  18. Douglis F, Iyengar A (2003) Application-specific delta-encoding via resemblance detection. In: USENIX Annual technical conference, General Track, pp 113–126

  19. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  20. Hu X, Yuan M, Yao J, Deng Y, Chen L, Yang Q, Guan H, Zeng J (2015) Differential privacy in telco big data platform. Proc VLDB Endow 8 (12):1692–1703. https://doi.org/10.14778/2824032.2824067

    Article  Google Scholar 

  21. Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, Yang Q, Zeng J (2015) Telco churn prediction with big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD. ACM, New York, pp 607–618, https://doi.org/10.1145/2723372.2742794

  22. Iyer AP, Li LE, Stoica I (2015) Celliq: real-time cellular network analytics at scale. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 309–322

  23. Kersten ML (2015) Big data space fungus. In: CIDR 2015, Seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings

  24. Kersten ML, Sidirourgos L (2017) A database system with amnesia. In: CIDR

  25. Krishna K, Jain D, Mehta SV, Choudhary S (2018) An lstm based system for prediction of human activities with durations. Proc ACM Interact Mob Wearable Ubiquitous Technol 1(4):147:1–147:31. http://doi.acm.org/10.1145/3161201

    Article  Google Scholar 

  26. LaChapelle C (2016) The cost of data storage and management: where is the it headed in 2016? http://www.datacenterjournal.com/cost-data-storage-management-headed-2016/

  27. Laiho J, Wacker A, Novosad T (2006) Radio network planning and optimisation for UMTS. Wiley

  28. Lakshminarasimhan S, Shah N, Ethier S, Klasky S, Latham R, Ross R, Samatova NF (2011) Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: European conference on parallel processing. Springer, pp 366–379

  29. Luo C, Zeng J, Yuan M, Dai W, Yang Q (2016) Telco user activity level prediction with massive mobile broadband data. ACM Trans Intell Syst Technol 7(4):63,1–63,30. https://doi.org/10.1145/2856057

    Article  Google Scholar 

  30. Savitz E (2012) Forbes magazine. https://goo.gl/eM1uwV, [Online; April 16, 2012]

  31. Schendel ER, Jin Y, Shah N, Chen J, Chang CS, Ku SH, Ethier S, Klasky S, Latham R, Ross R et al (2012) Isobar preconditioner for effective and high-throughput lossless data compression. In: 2012 IEEE 28th international conference on data engineering. IEEE, pp 138–149

  32. Sidirourgos L, Martin, Boncz P (2011) Sciborq: Scientific data management with bounds on runtime and quality. In: Proc. of the Int’l conf. on innovative data systems research (CIDR, pp 296–301)

  33. Soroush E, Balazinska M (2013) Time travel in a scientific array database. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 98–109

  34. Wei Z, Luo G, Yi K, Du X, Wen JR (2015) Persistent data sketching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 795–810. http://doi.acm.org/10.1145/2723372.2749443

  35. Yan H, Ding S, Suel T (2009) Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th international conference on World wide web. ACM, pp 401–410

  36. You LL, Pollack KT, Long DD, Gopinath K (2011) Presidio: a framework for efficient archival data storage. ACM Trans Storage (TOS) 7(2):6

    Google Scholar 

  37. Yuan M, Deng K, Zeng J, Li Y, Ni B, He X, Wang F, Dai W, Yang Q (2014) Oceanst: a distributed analytic system for large-scale spatiotemporal mobile broadband data. Proc VLDB Endow 7(13):1561–1564. https://doi.org/10.14778/2733004.2733030

    Article  Google Scholar 

  38. Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-ola: generalized on-line aggregation for interactive analysis on big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 913–918. http://doi.acm.org/10.1145/2723372.2735381

  39. Zhang S, Yang Y, Fan W, Lan L, Yuan M (2014) Oceanrt: real-time analytics over large temporal data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 1099–1102, https://doi.org/10.1145/2588555.2594513

  40. Zhu F, Luo C, Yuan M, Zhu Y, Zhang Z, Gu T, Deng K, Rao W, Zeng J (2016) City-scale localization with telco big data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM. ACM, New York, pp 439–448, https://doi.org/10.1145/2983323.2983345

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantinos Costa.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa, C., Konstantinidis, A., Charalampous, A. et al. Continuous decaying of telco big data with data postdiction. Geoinformatica 23, 533–557 (2019). https://doi.org/10.1007/s10707-019-00364-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00364-z

Keywords

Navigation