当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment
Journal of Big Data ( IF 8.6 ) Pub Date : 2019-11-27 , DOI: 10.1186/s40537-019-0271-7
Abderrahmane Ed-daoudy , Khalil Maalmi

A number of technologies enabled by Internet of Thing (IoT) have been used for the prevention of various chronic diseases, continuous and real-time tracking system is a particularly important one. Wearable medical devices with sensor, health cloud and mobile applications have continuously generating a huge amount of data which is often called as streaming big data. Due to the higher speed of the data generation, it is difficult to collect, process and analyze such massive data in real-time in order to perform real-time actions in case of emergencies and extracting hidden value. using traditional methods which are limited and time-consuming. Therefore, there is a significant need to real-time big data stream processing to ensure an effective and scalable solution. In order to overcome this issue, this work proposes a new architecture for real-time health status prediction and analytics system using big data technologies. The system focus on applying distributed machine learning model on streaming health data events ingested to Spark streaming through Kafka topics. Firstly, we transform the standard decision tree (DT) (C4.5) algorithm into a parallel, distributed, scalable and fast DT using Spark instead of Hadoop MapReduce which becomes limited for real-time computing. Secondly, this model is applied to streaming data coming from distributed sources of various diseases to predict health status. Based on several input attributes, the system predicts health status, send an alert message to care providers and store the details in a distributed database to perform health data analytics and stream reporting. We measure the performance of Spark DT against traditional machine learning tools including Weka. Finally, performance evaluation parameters such as throughput and execution time are calculated to show the effectiveness of the proposed architecture. The experimental results show that the proposed system is able to effectively process and predict real-time and massive amount of medical data enabled by IoT from distributed and various diseases.

中文翻译:

一种新的物联网架构,可在大数据环境下使用机器学习实时预测各种疾病

物联网(IoT)支持的许多技术已用于预防各种慢性病,连续和实时跟踪系统是特别重要的一种。具有传感器,健康云和移动应用程序的可穿戴医疗设备不断产生大量数据,这些数据通常被称为流式大数据。由于数据生成的速度较高,因此难以实时收集,处理和分析此类海量数据,以便在紧急情况下执行实时操作并提取隐藏的价值。使用有限且费时的传统方法。因此,迫切需要实时处理大数据流以确保有效且可扩展的解决方案。为了解决这个问题,这项工作提出了一种使用大数据技术的实时健康状况预测和分析系统的新架构。该系统专注于将分布式机器学习模型应用于通过Kafka主题提取到Spark流中的流健康数据事件。首先,我们使用Spark而不是Hadoop MapReduce将标准决策树(DT)(C4.5)算法转换为并行,分布式,可伸缩且快速的DT,而Hadoop MapReduce成为实时计算的限制。其次,该模型应用于来自各种疾病的分布式来源的流数据以预测健康状况。系统基于几个输入属性,预测健康状况,向护理提供者发送警报消息,并将详细信息存储在分布式数据库中,以执行健康数据分析和流报告。我们使用传统的机器学习工具(包括Weka)来衡量Spark DT的性能。最后,计算性能评估参数,例如吞吐量和执行时间,以显示所提出体系结构的有效性。实验结果表明,所提出的系统能够有效地处理和预测物联网从分布式疾病和各种疾病中获取的实时大量医疗数据。
更新日期:2019-11-27
down
wechat
bug