当前位置: X-MOL 学术Computing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IoT streaming data integration from multiple sources
Computing ( IF 3.7 ) Pub Date : 2020-07-08 , DOI: 10.1007/s00607-020-00830-9
Doan Quang Tu , A. S. M. Kayes , Wenny Rahayu , Kinh Nguyen

The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semi-structured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integration of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users.

中文翻译:

来自多个来源的物联网流数据集成

由于当今互联世界中智能技术的发展,物联网 (IoT) 最近受到了极大的关注。随着互联网技术的快速进步和物联网传感器的激增,无数系统和应用程序生成了传统数据库和系统无法有效管理的海量、种类和速度的数据。许多组织需要处理这些海量数据集,这些数据集遇到来自多个来源的不同格式(例如结构化、半结构化)的不同类型的数据(例如,IoT 流数据、静态数据)。已经设计了几种数据集成机制来处理大部分静态数据。不幸的是,这些技术无法处理和集成来自多个来源的物联网流数据集。在本文中,我们确定了物联网流数据集成 (ISDI) 的挑战,并提出了一种用于实时集成此类物联网流数据集的正式方法。我们解决了来自多个源的流数据之间时间冲突/对齐的重要问题之一。提出了一种通用的基于窗口的 ISDI 方法来处理不同格式的物联网数据,并开发了算法来集成来自多个来源的物联网流数据。特别是,我们扩展了用于实时数据集成和处理时序对齐问题的基本窗口算法。我们还引入了重复数据删除算法来处理数据冗余并展示集成数据的有用片段。我们进行了几组实验并量化了我们提出的基于窗口的方法的性能。特别是,我们使用 Apache Spark 将我们的本地实验结果与流数据的真实设置进行比较。在多个 IoT 数据集上执行的实验结果显示了我们提出的解决方案在处理时间方面的效率。结果还用于向用户提供集成数据视图。
更新日期:2020-07-08
down
wechat
bug