当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using Big Data-machine learning models for diabetes prediction and flight delays analytics
Journal of Big Data ( IF 8.1 ) Pub Date : 2020-09-17 , DOI: 10.1186/s40537-020-00355-0
Thérence Nibareke , Jalal Laassiri

Introduction

Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions.

Case description

We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays.

Discussion and evaluation

The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays.

Conclusions

Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).



中文翻译:

使用大数据机器学习模型进行糖尿病预测和航班延误分析

介绍

如今,每天都会以高速率生成大量数据。来自卫生系统,社交网络,金融,政府,市场,银行交易以及检查器和智能设备的数据正在增加。工具和模型必须进行优化。在本文中,我们应用并比较了机器学习算法(线性回归,朴素贝叶斯,决策树)来预测糖尿病。此外,我们对航班延误进行了分析。本文的主要贡献是概述大数据工具和机器学习模型。我们重点介绍一些指标,这些指标使我们能够选择更准确的模型。我们使用三种机器学习模型预测糖尿病疾病,然后比较它们的表现。

案例说明

我们应用了三种机器学习算法来预测糖尿病,并且我们比较了性能以查看哪种模型能提供最佳结果。我们对航班数据集进行了分析,以帮助决策和预测航班延误。

讨论与评估

实验表明,线性回归,朴素贝叶斯和决策树具有相同的精度(0.766),但决策树以得分最高(1)和误差最小(0)优于其他两个模型。对于航班延误分析,模型可以显示例如记录了最多航班延误的机场。

结论

本文讨论了用于处理大数据分析的几种工具和机器学习模型。我们得出结论,对于相同的数据集,我们必须谨慎选择要用于预测的模型。在未来的工作中,我们将在其他领域(气候,银行,保险)中测试不同的模型。

更新日期:2020-09-18
down
wechat
bug