A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus
Applied Intelligence ( IF 3.4 ) Pub Date : 2021-06-10 , DOI: 10.1007/s10489-021-02533-w
Haohui Lu , Shahadat Uddin , Farshid Hajati , Mohammad Ali Moni , Matloob Khushi

In recent years, the prevalence of chronic diseases such as type 2 diabetes mellitus (T2DM) has increased, bringing a heavy burden to healthcare systems. While regular monitoring of patients is expensive and impractical, understanding chronic disease progressions and identifying patients at risk of developing comorbidities are crucial. This research used a real-world administrative claim dataset of T2DM to develop an ensemble of innovative patient network and machine learning approach for disease prediction. The healthcare data of 1,028 T2DM patients and 1,028 non-T2DM patients are extracted from the de-identified data to predict the risk of T2DM. The proposed model is based on the ‘patient network’, which represents the underlying relationships among health conditions for a group of patients diagnosed with the same disease using the graph theory. Besides patients’ socio-demographic and behaviour characteristics, the attributes of the ‘patient network’ (e.g., centrality measure) discover patients’ latent features, which are effective in risk prediction. We apply eight machine learning models (Logistic Regression, K-Nearest Neighbours, Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, XGBoost and Artificial Neural Network) to the extracted features to predict the chronic disease risk. The extensive experiments show that the proposed framework with machine learning classifiers performance with the Area Under Curve (AUC) ranged from 0.79 to 0.91. The Random Forest model outperformed the other models; whereas, eigenvector centrality and closeness centrality of the network and patient age are the most important features for the model. The outstanding performance of our model provides promising potential applications in healthcare services. Also, we provide strong evidence that the extracted latent features are essential in the disease risk prediction. The proposed approach offers vital insight into chronic disease risk prediction that could benefit healthcare service providers and their stakeholders.

中文翻译：

基于患者网络的疾病预测机器学习模型：以 2 型糖尿病为例

近年来，2型糖尿病（T2DM）等慢性病患病率不断上升，给医疗系统带来沉重负担。虽然对患者进行定期监测既昂贵又不切实际，但了解慢性病进展和识别有发生合并症风险的患者至关重要。本研究使用 T2DM 的真实世界行政索赔数据集来开发用于疾病预测的创新患者网络和机器学习方法的集合。从去标识化数据中提取 1,028 名 T2DM 患者和 1,028 名非 T2DM 患者的医疗保健数据来预测 T2DM 的风险。所提出的模型基于'患者网络'，它代表了一组使用图论诊断出患有相同疾病的患者的健康状况之间的潜在关系。除了患者的社会人口统计学和行为特征外，“患者网络”的属性'（例如，中心性度量）发现患者的潜在特征，这在风险预测中是有效的。我们将八种机器学习模型（逻辑回归、K-最近邻、支持向量机、朴素贝叶斯、决策树、随机森林、XGBoost 和人工神经网络）应用于提取的特征来预测慢性病风险。大量实验表明，所提出的具有机器学习分类器性能的框架，曲线下面积 (AUC) 范围为 0.79 到 0.91。随机森林模型优于其他模型；而网络的特征向量中心性和接近中心性以及患者年龄是该模型最重要的特征。我们模型的出色表现为医疗保健服务提供了有前景的潜在应用。还，我们提供了强有力的证据，表明提取的潜在特征在疾病风险预测中是必不可少的。提议的方法提供了对慢性病风险预测的重要见解，可以使医疗保健服务提供者及其利益相关者受益。

更新日期：2021-06-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11