Skip to main content
Log in

Compact representations for efficient storage of semantic sensor data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Nowadays, there is a rapid increase in the number of sensor data generated by a wide variety of sensors and devices. Data semantics facilitate information exchange, adaptability, and interoperability among several sensors and devices. Sensor data and their meaning can be described using ontologies, e.g., the Semantic Sensor Network (SSN) Ontology. Notwithstanding, semantically enriched, the size of semantic sensor data is substantially larger than raw sensor data. Moreover, some measurement values can be observed by sensors several times, and a huge number of repeated facts about sensor data can be produced. We propose a compact or factorized representation of semantic sensor data, where repeated measurement values are described only once. Furthermore, these compact representations are able to enhance the storage and processing of semantic sensor data. To scale up to large datasets, factorization based, tabular representations are exploited to store and manage factorized semantic sensor data using Big Data technologies. We empirically study the effectiveness of a semantic sensor’s proposed compact representations and their impact on query processing. Additionally, we evaluate the effects of storing the proposed representations on diverse RDF implementations. Results suggest that the proposed compact representations empower the storage and query processing of sensor data over diverse RDF implementations, and up to two orders of magnitude can reduce query execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://wiki.knoesis.org/index.php/LinkedSensorData

  2. https://www.w3.org/2005/Incubator/ssn/

  3. https://www.w3.org/Submission/CBD/

  4. https://github.com/SDM-TIB/Compact-SSN-Data-Theorem

  5. https://parquet.apache.org/

  6. https://github.com/SDM-TIB/Compact-SSN-Data-Theorem

  7. Available at: http://wiki.knoesis.org/index.php/LinkedSensorData

  8. Available at: http://iot.ee.surrey.ac.uk:8080/datasets.html

  9. https://www.w3.org/wiki/SRBench

  10. Details can be found at https://sites.google.com/site/fssdexperimets/

  11. To run cold cache, we clear the cache before running each query by performing the command sh -c "sync ; echo 3 > /proc/sys/vm/drop_caches"

  12. http://spark.apache.org/

  13. https://hadoop.apache.org/

  14. https://github.com/SDM-TIB/SemanticSensorDataFactorization

  15. https://parquet.apache.org/

References

  • Ali, M. I., Gao, F., & Mileo, A. (2015). Citybench: a configurable benchmark to evaluate rsp engines using smart city datasets. In International semantic web conference (pp. 374–389). Springer.

  • Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., & Martínez-Prieto, M.A. (2011). Compressed k2-triples for full-in-memory rdf engines. arXiv:1105.4004.

  • Arenas, M., Gutierrez, C., & Pérez, J. (2009). Foundations of rdf databases. In Reasoning web. Semantic technologies for information systems (pp. 158–204). Springer.

  • Bakibayev, N., Olteanu, D., & Zavodny, J. (2012). FDB: A query engine for factorised relational databases. PVLDB, 5(11), 1232–1243.

    Google Scholar 

  • Bakibayev, N., Kociskẏ, T., Olteanu, D., & Zavodny, J. (2013). Aggregation and ordering in factorised databases. PVLDB, 6(14), 1990–2001.

    Google Scholar 

  • Bok, K., Han, J., Lim, J., & Yoo, J. (2019). Provenance compression scheme based on graph patterns for large rdf documents. The Journal of Supercomputing, pp. 1–23.

  • Brayton, R. K. (1987). Factoring logic functions. IBM Journal of Research and Development, 31(2), 187–198.

    Article  MathSciNet  Google Scholar 

  • Brisaboa, N.R., Ladra, S., & Navarro, G. (2009). k2-trees for compact web graph representation. In International Symposium on String Processing and Information Retrieval (pp. 18–30). Springer. https://doi.org/10.1007/978-3-642-03784-9_3.

  • Compton, M., Barnaghi, P., Bermudez, L., García-castro, R., Corcho, O., Cox, S., Graybeal, J., Hauswirth, M., Henson, C., Herzog, A., & et al. (2012). The ssn ontology of the w3c semantic sensor network incubator group. Web Semantics: Science, Services and Agents on the World Wide Web 17 (pp. 25–32).

  • Copeland, G. P., & Khoshafian, S. N. (1985). A decomposition storage model. In Acm sigmod record (vol. 14, pp. 268–279). ACM. https://doi.org/10.1145/318898.318923.

  • Du, J. H., Wang, H. F., Ni, Y., & Yu, Y. (2012). Hadooprdf: a scalable semantic data analytical engine. In International conference on intelligent computing (pp. 633–641). Springer.

  • Endris, K. M., Galkin, M., Lytra, I., Mami, M. N., Vidal, M. E., & Auer, S. (2017). Mulder: querying the linked data web by bridging rdf molecule templates. In International conference on database and expert systems applications (pp. 3–18). Springer.

  • Fernȧndez, J. D., Martínez-prieto, M.A., Gutiėrrez, C., Polleres, A., & Arias, M. (2013). Binary RDF representation for publication and exchange (HDT). J. Web Sem., 19, 22–41.

  • Fernȧndez, J. D., Llaves, A., & Corcho, Ȯ. (2014). Efficient RDF interchange (ERI) format for RDF data streams. In The semantic web - ISWC 2014 (pp. 244–259).

  • Gaur, A., Scotney, B., Parr, G., & McClean, S. (2015). Smart city architecture and its applications based on iot. Procedia computer science, 52, 1089–1094.

    Article  Google Scholar 

  • Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., & Kersten, M. (2012). Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin.

  • Jabbar, S., Ullah, F., Khalid, S., Khan, M., & Han, K. (2017). Semantic interoperability in heterogeneous iot infrastructure for healthcare. Wireless Communications and Mobile Computing.

  • Joshi, A. K., Hitzler, P., & Dong, G. (2013). Logical linked data compression. In 10Th extended semantic web conf. ESWC (pp. 170–184).

  • Karim, F., Mami, M. N., Vidal, M. E., & Auer, S. (2017). Large-scale storage and query processing for semantic sensor data. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (p. 8). ACM.

  • Khadilkar, V., Kantarcioglu, M., Thuraisingham, B., & Castagna, P. (2012). Jena-hbase: a distributed, scalable and efficient rdf triple store. In Proceedings of the 11th International Semantic Web Conference Posters & Demonstrations Track, ISWC-PD (vol. 12, pp. 85–88). Citeseer.

  • MacNicol, R., & French, B. (2004). Sybase iq multiplex-designed for analytics. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 1227–1230). VLDB Endowment.

  • Mami, M. N., Scerri, S., Auer, S., & Vidal, M. E. (2016). Towards semantification of big data technology. In International conference on big data analytics and knowledge discovery (pp. 376–390). Springer.

  • Meier, M. (2008). Towards rule-based minimization of rdf graphs under constraints. In International Conference on Web Reasoning and Rule Systems (pp. 89–103). Springer. https://doi.org/10.1007/978-3-540-88737-9_8.

  • Neumann, T., & Weikum, G. (2010). The rdf-3x engine for scalable management of rdf data. The VLDB Journal The International Journal on Very Large Data Bases, 19(1), 91–113.

    Article  Google Scholar 

  • Nie, Z., Du, F., Chen, Y., Du, X., & Xu, L. (2012). Efficient sparql query processing in mapreduce through data partitioning and indexing. In Asia-pacific web conference (pp. 628–635). Springer.

  • Pan, J. Z., Gȯmez-pėrez, J.M., Ren, Y., Wu, H., Wang, H., & Zhu, M. (2014). Graph pattern based RDF data compression. In 4Th joint int. Conf. on semantic technology (JIST).

  • Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., & Koziris, N. (2013). H 2 rdf+: High-performance distributed joins over large-scale rdf graphs. In 2013 IEEE International conference on big data (pp. 255–263). IEEE.

  • Patni, H., Henson, C., & Sheth, A. (2010). Linked sensor data. In Collaborative technologies and systems (CTS), 2010 international symposium on (pp. 362–370). IEEE.

  • Pichler, R., Polleres, A., Skritek, S., & Woltran, S. (2010). Redundancy elimination on rdf graphs in the presence of rules, constraints, and queries. In International Conference on Web Reasoning and Rule Systems (pp. 133–148). Springer. https://doi.org/10.1007/978-3-642-15918-3_11.

  • Punnoose, R., Crainiceanu, A., & Rapp, D. (2012). Rya: a scalable rdf triple store for the clouds. In Proceedings of the 1st International Workshop on Cloud Intelligence (p. 4). ACM.

  • Schätzle, A., Przyjaciel-Zablocki, M., Dorner, C., Hornung, T., & Lausen, G. (2012). Cascading map-side joins over hbase for scalable join processing. In SSWS+ HPCSW@ ISWC (pp. 59–74).

  • Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., & Lausen, G. (2013). Pigsparql: a sparql query processing baseline for big data. In International semantic web conference (posters & demos) (vol. 1035, pp. 241–244).

  • Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’neil, E., & et al. (2005). C-store: a column-oriented dbms. In Proceedings of Very large data bases (pp. 553–564). VLDB Endowment.

  • Ullman, J. D. (1984). Principles of database systems. Galgotia Publications.

  • Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I. (2016). Apache spark: a unified engine for big data processing. Commun. ACM, 59(11), 56–65. https://doi.org/10.1145/2934664.

    Article  Google Scholar 

  • Zukowski, M., Heman, S., Nes, N., & Boncz, P. A. (2006). Super-scalar ram-cpu cache compression. In Icde (vol. 6, pp. 59) https://doi.org/10.1109/ICDE.2006.150.

Download references

Acknowledgments

Farah Karim is supported by the German Academic Exchange Service (DAAD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farah Karim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karim, F., Vidal, ME. & Auer, S. Compact representations for efficient storage of semantic sensor data. J Intell Inf Syst 57, 203–228 (2021). https://doi.org/10.1007/s10844-020-00628-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00628-3

Keywords

Navigation