当前位置: X-MOL 学术IEEE J. Transl. Eng. Health Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients using Primary Care Electronic Health Records
IEEE Journal of Translational Engineering in Health and Medicine ( IF 3.7 ) Pub Date : 2021-01-01 , DOI: 10.1109/jtehm.2020.3040236
Gavin Tsang 1 , Shang-Ming Zhou 2 , Xianghua Xie 1
Affiliation  

A growing elderly population suffering from incurable, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for preventative measures to alleviate said strain. Electronic health records provide opportunity for big data analysis to address such applications. Such data however, provides a challenging problem space for traditional statistics and machine learning due to high dimensionality and sparse data elements. This article proposes a novel machine learning methodology: entropy regularization with ensemble deep neural networks (ECNN), which simultaneously provides high predictive performance of hospitalization of patients with dementia whilst enabling an interpretable heuristic analysis of the model architecture, able to identify individual features of importance within a large feature domain space. Experimental results on health records containing 54,647 features were able to identify 10 event indicators within a patient timeline: a collection of diagnostic events, medication prescriptions and procedural events, the highest ranked being essential hypertension. The resulting subset was still able to provide a highly competitive hospitalization prediction (Accuracy: 0.759) as compared to the full feature domain (Accuracy: 0.755) or traditional feature selection techniques (Accuracy: 0.737), a significant reduction in feature size. The discovery and heuristic evidence of correlation provide evidence for further clinical study of said medical events as potential novel indicators. There also remains great potential for adaption of ECNN within other medical big data domains as a data mining tool for novel risk factor identification.

中文翻译:

对大量稀疏数据进行建模以进行特征选择:使用初级保健电子健康记录对痴呆症患者进行入院预测

患有痴呆症等无法治愈的慢性疾病的老年人口不断增加,由于精神障碍和高合并症导致住院风险增加,对医疗服务造成持续的压力。识别出高危个体可以采取预防措施来缓解上述压力。电子健康记录为大数据分析提供了解决此类应用的机会。然而,由于高维度和稀疏数据元素,此类数据为传统统计和机器学习提供了具有挑战性的问题空间。本文提出了一种新颖的机器学习方法:使用集成深度神经网络 (ECNN) 进行熵正则化,该方法同时提供痴呆症患者住院的高预测性能,同时能够对模型架构进行可解释的启发式分析,能够识别重要的个体特征在一个大的特征域空间内。包含 54,647 个特征的健康记录的实验结果能够识别患者时间线内的 10 个事件指标:诊断事件、药物处方和手术事件的集合,排名最高的是原发性高血压。与完整特征域(精度:0.755)或传统特征选择技术(精度:0.737)相比,所得子集仍然能够提供极具竞争力的住院预测(精度:0.759),特征尺寸显着减小。相关性的发现和启发式证据为所述医疗事件作为潜在新指标的进一步临床研究提供了证据。ECNN 作为识别新风险因素的数据挖掘工具,在其他医疗大数据领域的应用仍然具有巨大的潜力。
更新日期:2021-01-01
down
wechat
bug