当前位置: X-MOL 学术Library Hi Tech › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predictive analytics for blood glucose concentration: an empirical study using the tree-based ensemble approach
Library Hi Tech ( IF 1.623 ) Pub Date : 2020-07-01 , DOI: 10.1108/lht-08-2019-0171
Jiaming Liu , Liuan Wang , Linan Zhang , Zeming Zhang , Sicheng Zhang

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.



中文翻译:

血糖浓度的预测分析:基于树的集成方法的实证研究

目的

这项研究的主要目的是通过数据驱动方法识别预测血糖(BG)的关键指标,并比较四种基于树的集成模型的预测性能,即使用树回归器进行装袋(装袋决策树[Bagging- DT]),具有树回归器(Adaboost-DT),随机森林(RF)和梯度增强决策树(GBDT)的AdaBoost。

设计/方法/方法

这项研究提出了一种多数投票特征选择方法,将套索回归与Akaike信息标准(AIC)(LR-AIC),套索回归与贝叶斯信息标准(BIC)(LR-BIC)和RF相结合,以选择具有良好预测性的指标从5,642个样本中的最初38个指标中获得的效果。部署选定的功能以构建基于树的集成模型。10倍交叉验证(CV)方法用于评估每个集成模型的性能。

发现

特征选择的结果表明,年龄,红细胞血红蛋白浓度(CHC),红细胞体积分布宽度(RBCVDW),红细胞体积和白细胞计数是BG预测中最重要的五个临床/物理指标。此外,该研究还发现,结合建议的多数表决特征选择方法,GBDT集成模型在预测性能和稳定性方面优于其他三个模型。

实际影响

这项研究提出了一种新颖的BG预测框架,以更好地进行医疗保健中的预测分析。

社会影响

这项研究结合了医学背景和机器学习技术,以减少糖尿病的发病率并制定精确的医学方案。

创意/价值

多数投票特征选择方法与GBDT集成模型相结合,为预测BG和提前检测糖尿病风险提供了有效的决策工具。

更新日期:2020-07-01
down
wechat
bug