当前位置: X-MOL 学术Technological Forecasting and Social Change › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Curvature-based feature selection with application in classifying electronic health records
Technological Forecasting and Social Change ( IF 12.0 ) Pub Date : 2021-09-08 , DOI: 10.1016/j.techfore.2021.121127
Zheming Zuo 1 , Jie Li 2 , Han Xu 3, 4 , Noura Al Moubayed 1
Affiliation  

Disruptive technologies provides unparalleled opportunities to contribute to the identifications of many aspects in pervasive healthcare, from the adoption of the Internet of Things through to Machine Learning (ML) techniques. As a powerful tool, ML has been widely applied in patient-centric healthcare solutions. To further improve the quality of patient care, Electronic Health Records (EHRs) are commonly adopted in healthcare facilities for analysis. It is a crucial task to apply AI and ML to analyse those EHRs for prediction and diagnostics due to their highly unstructured, unbalanced, incomplete, and high-dimensional nature. Dimensionality reduction is a common data preprocessing technique to cope with high-dimensional EHR data, which aims to reduce the number of features of EHR representation while improving the performance of the subsequent data analysis, e.g. classification. In this work, an efficient filter-based feature selection method, namely Curvature-based Feature Selection (CFS), is presented. The proposed CFS applied the concept of Menger Curvature to rank the weights of all features in the given data set. The performance of the proposed CFS has been evaluated in four well-known EHR data sets, including Cervical Cancer Risk Factors (CCRFDS), Breast Cancer Coimbra (BCCDS), Breast Tissue (BTDS), and Diabetic Retinopathy Debrecen (DRDDS). The experimental results show that the proposed CFS achieved state-of-the-art performance on the above data sets against conventional PCA and other most recent approaches. The source code of the proposed approach is publicly available at https://github.com/zhemingzuo/CFS.



中文翻译:

基于曲率的特征选择在电子健康档案分类中的应用

从物联网的采用到机器学习 (ML) 技术,颠覆性技术提供了无与伦比的机会,可以为普及医疗保健的许多方面的识别做出贡献。作为一种强大的工具,机器学习已广泛应用于以患者为中心的医疗保健解决方案。为了进一步提高患者护理质量,医疗机构通常采用电子健康记录 (EHR) 进行分析。由于其高度非结构化、不平衡、不完整和高维的性质,应用 AI 和 ML 分析这些 EHR 以进行预测和诊断是一项至关重要的任务。降维是处理高维EHR数据的常用数据预处理技术,它旨在减少 EHR 表示的特征数量,同时提高后续数据分析(例如分类)的性能。在这项工作中,提出了一种有效的基于过滤器的特征选择方法,即基于曲率的特征选择(CFS)。提议的 CFS 应用 Menger Curvature 的概念对给定数据集中所有特征的权重进行排序。已在四个著名的 EHR 数据集中评估了拟议 CFS 的性能,包括宫颈癌风险因素 (CCRFDS)、乳腺癌科英布拉 (BCCDS)、乳腺组织 (BTDS) 和糖尿病视网膜病变 Debrecen (DRDDS)。实验结果表明,与传统的 PCA 和其他最新方法相比,所提出的 CFS 在上述数据集上取得了最先进的性能。

更新日期:2021-09-08
down
wechat
bug